SRE Weekly - SRE Weekly Issue #272
Articles
Salesforce has posted a ton of information about their major outage two weeks ago.
It involved a change to their DNS system that combined with an issue in BIND daemon shutdown that prevented it from starting back up.
The analysis goes into great detail on the fact that an engineer used the Emergency Break-Fix (EBF) process to rush out the DNS configuration change.
In this case, the engineer subverted the known policy and the appropriate disciplinary action has been taken to ensure this does not happen in the future.
Thanks to an anonymous reader for pointing this out to me.
Salesforce
This article calls out the heavily blame-ridden language in the above incident analysis and the briefing given by Salesforce’s Chief Reliability Officer.
I’m dismayed to see such language from someone who is at the C-level for reliability.
“For whatever reason that we don’t understand, the employee decided to do a global deployment,” Dieken went on.
Richard Speed — The Register
…and the Twittersphere agrees with me.
If you want to blame someone, maybe try blaming the “chief availability officer” who oversees a system so fragile that one action by one engineer can cause this much damage. But it’s never that simple, is it.
@ReinH on Twitter
Another really great take on the Salesforce outage followup.
Lorin Hochstein
I like how this article covers the different roles that SREs play.
Emily Arnott — Blameless
The principles covered in this article are:
- Build a hypothesis around steady-state behavior
- Vary real-world events
- Run experiments in production
- Automate experiments to run continuously
- Minimize blast radius
Casey Rosenthal — Verica
This post is full of thought-provoking questions on the nature of configuration changes and incidents.
Lorin Hochstein
Outages
- IBM Cloud
- Klarna
- Klarna showed users information related to other users, as detailed in this followup post.
|
Older messages
SRE Weekly Issue #271
Monday, May 24, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join StackHawk on Tuesday, May 25 for a hands-on authenticated security testing workshop. Follow along as we walk through three common
SRE Weekly Issue #270
Monday, May 17, 2021
View on sreweekly.com A message from our sponsor, StackHawk: APIs are not only the backbone of modern application architecture, but they are also a key part of security. Discover what API security
SRE Weekly Issue #269
Monday, May 10, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Tune into ZAPCon After Hours this Tuesday at 8 am PT to learn how to include automated security testing in your builds with ZAP http://sthwk
SRE Weekly Issue #268
Monday, May 3, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join StackHawk Tuesday May 4 at 9 am PT for a hands-on technical workshop! By the end of the session, you will have three types of security
SRE Weekly Issue #267
Monday, April 26, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Serverless doesn't mean secure. Use modern security testing tools to assess serverless applications for vulnerabilities during
You Might Also Like
Practical Introduction to Event Sourcing with Emmett
Monday, March 10, 2025
Emmett is a framework that will take your applications back to the future. Learn mor on how Event Sourcing can be practical and smoother with it.The idea behind Emmett was to make it easier to create
WP Weekly 233 - Themes - Offline AI+WP, Trademarks Done, 50K Users in 34 Days
Monday, March 10, 2025
Read on Website WP Weekly 233 / Themes Building new Themes without built-in audience is tough, reveals study. Managed WordPress and Hosted WordPress trademarks acquired. Also in this issue, brand new
SRE Weekly Issue #467
Monday, March 10, 2025
View on sreweekly.com A message from our sponsor, incident.io: SEV0 is back. This fall, we're bringing together the best minds in incident management for a day of learning, sharing, and networking
Where’s Apple Intelligence? - Sync #509
Sunday, March 9, 2025
Plus: Musk vs OpenAI trial set for expedited trial this year; scientists create woolly mice; an android with artificial muscles; another dancing humanoid robot; how to make superbabies; and more! ͏ ͏ ͏
CD#547 Writing a .NET profiler in C#
Sunday, March 9, 2025
CPU profiler for .NET using Silhouette ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
RD#496 Signals in React?
Sunday, March 9, 2025
Not a good idea according to Filipe ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
PD#616 Bloom Filter: A Deep Dive
Sunday, March 9, 2025
How Bloom filters are useful in scenarios with memory constraints ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Daily Coding Problem: Problem #1713 [Hard]
Sunday, March 9, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Netflix. Implement a queue using a set of fixed-length arrays. The queue should support
Netflix codes/Travel Adapter/Real China
Sunday, March 9, 2025
Recomendo - issue #453 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Sunday Digest | Featuring 'The 15 Largest Defense Budgets in the World' 📊
Sunday, March 9, 2025
Every visualization published this week, in one place. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏