SRE Weekly - SRE Weekly Issue #390
Many apologies to my email subscribers, who have seen two accidental re-sends of old issues recently due to a weird glitch in my automation. I think I’ve gotten a handle on it, and I’ll run an internal retrospective of this incident, of course.
Articles
Is it really SRE vs platform engineer? Or is there a way platforms can take site reliability to the next level?
Jennifer Riggins — The New Stack
A surgeon delves into the key component that allows a group of skilled individuals to work effectively and safely together, using the term “heed” to describe this special interaction.
Sidenote: in a hilarious coincidence this article managed to spoil me on a movie I was in the middle of watching (Arrival) — but it also put me in a really cool mindset to watch the rest of the film.
Dr. Rob Poston
More details on Square’s outage from a couple weeks ago (it was DNS).
Square
Azure had an interesting outage in its Australia East region involving a power failure and the order cooling units were restored in.
Microsoft Azure
Asking this question is how you unlock the hidden essence of an incident. This talk compares two public incident reports to show what it looks like when you dig into this question and when you don’t.
Jacob Scott — InfoQ
In this air accident, the pilots made a seemingly inexplicable mistake.
This sentence really stood out to me, especially after reading the “How Did It Make Sense at the Time?” article:
When we inexplicably grab the wrong utensil when cooking or accidentally start taking our dirty dishes to the bathroom instead of the kitchen, we should be thankful that we aren’t responsible for a plane full of people.
Admiral Cloudberg
There’s an interesting failure mode in this one that might stand out for the Kafka admins among us:
The Kafka consumer ended up stuck in a loop, unable to stabilize fast enough before timing out and restarting the coordination process.
Jakub Oleksy — GitHub
After explaining the difference between the ITIL terms “incident management” and “problem management”, this article goes into a discussion of recent trends and whether it still makes sense to draw a distinction between the two.
Luis Gonzalez — incident.io
|
Older messages
SRE Weekly Issue #385
Sunday, September 17, 2023
View on sreweekly.com Many apologies to Matt Cooper at GitHub, who is the actual author of the article Scaling Merge-ort Across GitHub from last week. Sorry for the mis-credit, Matt! A message from our
SRE Weekly Issue #389
Monday, September 11, 2023
View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already
SRE Weekly Issue #388
Monday, September 4, 2023
View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already
SRE Weekly Issue #388
Monday, September 4, 2023
View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already
SRE Weekly Issue #387
Monday, August 28, 2023
View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already
You Might Also Like
Daily Coding Problem: Problem #1705 [Medium]
Saturday, March 1, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Yext. Two nodes in a binary tree can be called cousins if they are on the same level of
Charted | The World's Most and Least Powerful Passports 🌎
Saturday, March 1, 2025
Which passports are the most and least powerful in 2025? This graphic ranks them based on Henley & Partners data on visa-free access. View Online | Subscribe | Download Our App Enjoying Visual
Mission Drift 🎒
Saturday, March 1, 2025
Why corporate changes can leave us disappointed. Here's a version for your browser. Hunting for the end of the long tail • March 1, 2025 Mission Drift If a company or service you rely on changes
🐍 New Python tutorials on Real Python
Saturday, March 1, 2025
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Quiz: How to Use sorted() and .sort() in Python
Second DOT ETF in 3 weeks
Saturday, March 1, 2025
DOOM ran on JAM 🤯, OriginTrail leads in revenue, Polkadot Hub on schedule for Q3, and more ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
This App Is a Productivity Power Tool
Saturday, March 1, 2025
Informant 5 is a complete planner in your pocket. Manage Calendars, Tasks, Projects, and Tags in a single app. This app is one of the few that combines both your calendar AND your tasks into a singe
🕹️ Who the iMac Is For in 2025 — 12 Nintendo Switch Games You Need to Play
Saturday, March 1, 2025
Also: 10 Hybrid Vehicles That Are Much Faster Than You'd Expect How-To Geek Logo March 1, 2025 Did You Know The quirky tiny car driven by the nerdy Steve Urkel in the 1990s sitcom Family Matters is
Mozilla Updates Firefox Terms Again After Backlash Over Broad Data License Language
Saturday, March 1, 2025
THN Daily Updates Newsletter cover Building a Smarter Defense How Gen AI Is Revolutionizing Threat Detection In Cybersecurity Download Now Sponsored LATEST NEWS Mar 1, 2025 Mozilla Updates Firefox
📧 Introduction to Dapr for .NET Developers
Saturday, March 1, 2025
Introduction to Dapr for .NET Developers Read on: my website / Read time: 10 minutes The .NET Weekly is brought to you by: Get every Dometrain Course at 40% off! Dometrain is an educational courses
This Week in Rust #588
Saturday, March 1, 2025
Email isn't displaying correctly? Read this e-mail on the Web This Week in Rust issue 588 — 26 FEB 2025 Hello and welcome to another issue of This Week in Rust! Rust is a programming language