SRE Weekly - SRE Weekly Issue #272
Articles
Salesforce has posted a ton of information about their major outage two weeks ago.
It involved a change to their DNS system that combined with an issue in BIND daemon shutdown that prevented it from starting back up.
The analysis goes into great detail on the fact that an engineer used the Emergency Break-Fix (EBF) process to rush out the DNS configuration change.
In this case, the engineer subverted the known policy and the appropriate disciplinary action has been taken to ensure this does not happen in the future.
Thanks to an anonymous reader for pointing this out to me.
Salesforce
This article calls out the heavily blame-ridden language in the above incident analysis and the briefing given by Salesforce’s Chief Reliability Officer.
I’m dismayed to see such language from someone who is at the C-level for reliability.
“For whatever reason that we don’t understand, the employee decided to do a global deployment,” Dieken went on.
Richard Speed — The Register
…and the Twittersphere agrees with me.
If you want to blame someone, maybe try blaming the “chief availability officer” who oversees a system so fragile that one action by one engineer can cause this much damage. But it’s never that simple, is it.
@ReinH on Twitter
Another really great take on the Salesforce outage followup.
Lorin Hochstein
I like how this article covers the different roles that SREs play.
Emily Arnott — Blameless
The principles covered in this article are:
- Build a hypothesis around steady-state behavior
- Vary real-world events
- Run experiments in production
- Automate experiments to run continuously
- Minimize blast radius
Casey Rosenthal — Verica
This post is full of thought-provoking questions on the nature of configuration changes and incidents.
Lorin Hochstein
Outages
- IBM Cloud
- Klarna
- Klarna showed users information related to other users, as detailed in this followup post.
|
Older messages
SRE Weekly Issue #271
Monday, May 24, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join StackHawk on Tuesday, May 25 for a hands-on authenticated security testing workshop. Follow along as we walk through three common
SRE Weekly Issue #270
Monday, May 17, 2021
View on sreweekly.com A message from our sponsor, StackHawk: APIs are not only the backbone of modern application architecture, but they are also a key part of security. Discover what API security
SRE Weekly Issue #269
Monday, May 10, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Tune into ZAPCon After Hours this Tuesday at 8 am PT to learn how to include automated security testing in your builds with ZAP http://sthwk
SRE Weekly Issue #268
Monday, May 3, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join StackHawk Tuesday May 4 at 9 am PT for a hands-on technical workshop! By the end of the session, you will have three types of security
SRE Weekly Issue #267
Monday, April 26, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Serverless doesn't mean secure. Use modern security testing tools to assess serverless applications for vulnerabilities during
You Might Also Like
Tesla Autopilot investigation closed
Friday, April 26, 2024
Inside the IBM-HashiCorp deal and Thoma Bravo takes another company private View this email online in your browser By Christine Hall Friday, April 26, 2024 Good afternoon, and welcome to TechCrunch PM.
Microsoft's and Google's bet on AI is paying off - Weekly News Roundup - Issue #464
Friday, April 26, 2024
Plus: AI-controlled F-16 has been dogfighting with humans; Grok-1.5 Vision; BionicBee; Microsoft's AI generates realistic deepfakes from a single photo; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🤓 The Meta Quest Might Be the VR Steam Deck Soon — Games to Play After Finishing Wordle
Friday, April 26, 2024
Also: Why a Cheap Soundbar Is Better Than Nothing, and More! How-To Geek Logo April 26, 2024 Did You Know TMI: Rhinotillexomania is the medical term for obsessive nose picking. 🖥️ Get Those Updates
JSK Daily for Apr 26, 2024
Friday, April 26, 2024
JSK Daily for Apr 26, 2024 View this email in your browser A community curated daily e-mail of JavaScript news A Solid primer on Signals with Ryan Carniato (JS Party #320) Ryan Carniato joins Amal
So are we banning TikTok or what?
Friday, April 26, 2024
Also: Can an influencer really tank an $800M company? View this email online in your browser By Haje Jan Kamps Friday, April 26, 2024 Image Credits: Jonathan Raa/NurPhoto / Getty Images Welcome to
[AI Incubator] 300+ people are already in. Enrollment closes tonight at 11:59pm PT.
Friday, April 26, 2024
How to decide if you're ready.
Daily Coding Problem: Problem #1423 [Medium]
Friday, April 26, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. You are given an array of nonnegative integers. Let's say you start at the
Data science for Product Managers
Friday, April 26, 2024
Crucial resources to empower you with data that matters.
Inner Thoughts
Friday, April 26, 2024
'The Inner Circle' Comes Around... Inner Thoughts By MG Siegler • 26 Apr 2024 View in browser View in browser If you'll allow me a brief meta blurb this week (not a Meta blurb, plenty of
Digest #135: Kubernetes Hacks, Terraform CI/CD, HashiCorp Acquisition, AWS Data Transfer Monitoring
Friday, April 26, 2024
Explore Advanced Kubernetes Techniques, Dive Into Terraform CI/CD Frameworks, Monitor AWS Data Transfer, and Explore Cloud Security with Gitleaks! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏