SRE Weekly - SRE Weekly Issue #390
Many apologies to my email subscribers, who have seen two accidental re-sends of old issues recently due to a weird glitch in my automation. I think I’ve gotten a handle on it, and I’ll run an internal retrospective of this incident, of course.
Articles
Is it really SRE vs platform engineer? Or is there a way platforms can take site reliability to the next level?
Jennifer Riggins — The New Stack
A surgeon delves into the key component that allows a group of skilled individuals to work effectively and safely together, using the term “heed” to describe this special interaction.
Sidenote: in a hilarious coincidence this article managed to spoil me on a movie I was in the middle of watching (Arrival) — but it also put me in a really cool mindset to watch the rest of the film.
Dr. Rob Poston
More details on Square’s outage from a couple weeks ago (it was DNS).
Square
Azure had an interesting outage in its Australia East region involving a power failure and the order cooling units were restored in.
Microsoft Azure
Asking this question is how you unlock the hidden essence of an incident. This talk compares two public incident reports to show what it looks like when you dig into this question and when you don’t.
Jacob Scott — InfoQ
In this air accident, the pilots made a seemingly inexplicable mistake.
This sentence really stood out to me, especially after reading the “How Did It Make Sense at the Time?” article:
When we inexplicably grab the wrong utensil when cooking or accidentally start taking our dirty dishes to the bathroom instead of the kitchen, we should be thankful that we aren’t responsible for a plane full of people.
Admiral Cloudberg
There’s an interesting failure mode in this one that might stand out for the Kafka admins among us:
The Kafka consumer ended up stuck in a loop, unable to stabilize fast enough before timing out and restarting the coordination process.
Jakub Oleksy — GitHub
After explaining the difference between the ITIL terms “incident management” and “problem management”, this article goes into a discussion of recent trends and whether it still makes sense to draw a distinction between the two.
Luis Gonzalez — incident.io
|
Older messages
SRE Weekly Issue #385
Sunday, September 17, 2023
View on sreweekly.com Many apologies to Matt Cooper at GitHub, who is the actual author of the article Scaling Merge-ort Across GitHub from last week. Sorry for the mis-credit, Matt! A message from our
SRE Weekly Issue #389
Monday, September 11, 2023
View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already
SRE Weekly Issue #388
Monday, September 4, 2023
View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already
SRE Weekly Issue #388
Monday, September 4, 2023
View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already
SRE Weekly Issue #387
Monday, August 28, 2023
View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already
You Might Also Like
iOS Dev Weekly – Issue 694
Friday, January 10, 2025
Hopefully you won't see that much difference with receiving this issue, but it's ALL CHANGED behind the scenes! 😱
Daily Coding Problem: Problem #1664 [Easy]
Friday, January 10, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Twitter. A permutation can be specified by an array P , where P[i] represents the
Spyglass Dispatch: The Case for a For-Profit OpenAI
Friday, January 10, 2025
RIP Venu • A More Political and Real Time Threads • An OpenAI Auction • Apple's Tough 2025 The Spyglass Dispatch is a newsletter sent on weekdays featuring links and commentary on timely topics
⌨️ 10 Mods to Improve Your Mechanical Keyboard — How to Set Up Quick Share on Windows
Friday, January 10, 2025
Also: Why Are Tech Companies Trying to Sell Me Expensive Clocks? How-To Geek Logo January 10, 2025 Did You Know Famed biologist Charles Darwin and US President Abraham Lincoln were born on the same day
Your best friends in design
Friday, January 10, 2025
Working With Designers Product manager & UX designer collaboration guide. How members of your product team work together is just as important as the work itself. A fundamental relationship within
Charted | How Canada Would Rank as the 51st State 📊
Friday, January 10, 2025
Donald Trump has floated the idea that Canada should be the 51st state. Here's how it compares statistically. View Online | Subscribe | Download Our App Presented by: Global X ETFs Power AI's
Pinpointing The Actual Problem 🎯
Friday, January 10, 2025
WordPress accidentally diagnoses its own business problem. Here's a version for your browser. Hunting for the end of the long tail • January 10, 2025 Pinpointing The Actual Problem A blog post from
😱Major Azure Outage in EastUS2, 🚀New AI and Azure Developer CLI Courses, azureedge.net DNS retiring
Friday, January 10, 2025
͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
iOS Cocoa Treats
Friday, January 10, 2025
View in browser Hello, you're reading Infinum iOS Cocoa Treats, bringing you the latest iOS related news straight to your inbox every week. Adopting Swift 6 across the app codebase I've been
Issue #575: Excalibird, bird’s eye metropolis, and Stimulation Clicker
Friday, January 10, 2025
View this email in your browser Issue #575 - January 10th 2025 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to