SRE Weekly - SRE Weekly Issue #272
Articles
Salesforce has posted a ton of information about their major outage two weeks ago.
It involved a change to their DNS system that combined with an issue in BIND daemon shutdown that prevented it from starting back up.
The analysis goes into great detail on the fact that an engineer used the Emergency Break-Fix (EBF) process to rush out the DNS configuration change.
In this case, the engineer subverted the known policy and the appropriate disciplinary action has been taken to ensure this does not happen in the future.
Thanks to an anonymous reader for pointing this out to me.
Salesforce
This article calls out the heavily blame-ridden language in the above incident analysis and the briefing given by Salesforce’s Chief Reliability Officer.
I’m dismayed to see such language from someone who is at the C-level for reliability.
“For whatever reason that we don’t understand, the employee decided to do a global deployment,” Dieken went on.
Richard Speed — The Register
…and the Twittersphere agrees with me.
If you want to blame someone, maybe try blaming the “chief availability officer” who oversees a system so fragile that one action by one engineer can cause this much damage. But it’s never that simple, is it.
@ReinH on Twitter
Another really great take on the Salesforce outage followup.
Lorin Hochstein
I like how this article covers the different roles that SREs play.
Emily Arnott — Blameless
The principles covered in this article are:
- Build a hypothesis around steady-state behavior
- Vary real-world events
- Run experiments in production
- Automate experiments to run continuously
- Minimize blast radius
Casey Rosenthal — Verica
This post is full of thought-provoking questions on the nature of configuration changes and incidents.
Lorin Hochstein
Outages
- IBM Cloud
- Klarna
- Klarna showed users information related to other users, as detailed in this followup post.
|
Older messages
SRE Weekly Issue #271
Monday, May 24, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join StackHawk on Tuesday, May 25 for a hands-on authenticated security testing workshop. Follow along as we walk through three common
SRE Weekly Issue #270
Monday, May 17, 2021
View on sreweekly.com A message from our sponsor, StackHawk: APIs are not only the backbone of modern application architecture, but they are also a key part of security. Discover what API security
SRE Weekly Issue #269
Monday, May 10, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Tune into ZAPCon After Hours this Tuesday at 8 am PT to learn how to include automated security testing in your builds with ZAP http://sthwk
SRE Weekly Issue #268
Monday, May 3, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join StackHawk Tuesday May 4 at 9 am PT for a hands-on technical workshop! By the end of the session, you will have three types of security
SRE Weekly Issue #267
Monday, April 26, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Serverless doesn't mean secure. Use modern security testing tools to assess serverless applications for vulnerabilities during
You Might Also Like
🔒 The Vault Newsletter: November issue 🔑
Monday, November 25, 2024
Get the latest business security news, updates, and advice from 1Password. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🧐 The Most Interesting Phones You Didn't See in 2024 — Making Reddit Faster on Older Devices
Monday, November 25, 2024
Also: Best Black Friday Deals So Far, and More! How-To Geek Logo November 25, 2024 Did You Know If you look closely over John Lennon's shoulder on the iconic cover of The Beatles Abbey Road album,
JSK Daily for Nov 25, 2024
Monday, November 25, 2024
JSK Daily for Nov 25, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
Ranked | How Americans Rate Business Figures 📊
Monday, November 25, 2024
This graphic visualizes the results of a YouGov survey that asks Americans for their opinions on various business figures. View Online | Subscribe Presented by: Non-consensus strategies that go where
Spyglass Dispatch: Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad
Monday, November 25, 2024
Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad The
Daily Coding Problem: Problem #1619 [Hard]
Monday, November 25, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given two non-empty binary trees s and t , check whether tree t has exactly the
Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow
Monday, November 25, 2024
Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the
What Investors Want From AI Startups in 2025
Monday, November 25, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon
GCP Newsletter #426
Monday, November 25, 2024
Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's
⏳ 36 Hours Left: Help Get "The Art of Data" Across the Finish Line 🏁
Monday, November 25, 2024
Visual Capitalist plans to unveal its secrets behind data storytelling, but only if the book hits its minimum funding goal. View Online | Subscribe | Download Our App We Need Your Help Only 36 Hours