SRE Weekly - SRE Weekly Issue #266
Articles
This one was brought to my attention by Dr. Richard Cook, who also pointed me to the AAIB incident report.
Dr. Cook went on to share these insights with me, which I’ve copied here with permission:
Note:
- the subtle interactions allowed the manual correction to be lost during the interval between recognizing the software problem and having the corrected software functionally ‘catch’ the Ms/Miss title mixup;
- the incident is attributed to “a simple flaw in the programming of the IT system” rather than failure of the workarounds that were put in place after the problem was recognized;
- the report is careful to demonstrate that the flaws in the system made only a slight difference to the flight parameters;
the report does not describe any IT process changes whatsoever!
The report has the effect of making the incident appear to be an unfortunate series of occurrences rather than being emblematic of the way that these sorts of processes are vulnerable.
Last year’s SRE From Home event was awesome, and this year’s iteration looks to be just as great.
Catchpoint
This is fun! Try your hand at troubleshooting a connection issue in this game-ified role-play scenario.
BONUS CONTENT: Read about the author’s motivations, design decisions, and plans here.
Julia Evans
Do we need to have some kind of Pillars Registry? Note, these are more like pillars of high availability than resilience engineering.
Hector Aguilar — Okta
I love this idea that we’re trying to get deep incident analysis done even though that may not be the actual goal of the organization.
As LFI analysts, we’re exploiting this desire for closure to justify spending time examining how work is really done inside of the system.
Lorin Hochstein
This is well worth a read if only for the on-call scenario at the start. Yup, been there. We miss you, Harry.
Harry Hull — Blameless
What’s the difference? Click through to learn about the distinction they’re drawing.
Amir Kazemi — effx
The New York Times’s Operations Engineering group developed an Operational Maturity Assessment and uses it to have collaborative conversations with teams about their systems.
Authro: The NYT Open Team — New York Times
Outages
- G-Suite
- Google posted this “Mini Incident Report while full Incident Report is prepared.”
- Slack
- Docker Hub
- Robinhood
- Elevated CDN Errors
- Heroku
|
Older messages
SRE Weekly Issue #265
Monday, April 12, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join StackHawk and WhiteSource tomorrow morning to learn about automated security testing in the DevOps pipeline. With automated dynamic
SRE Weekly Issue #264
Monday, April 5, 2021
View on sreweekly.com A message from our sponsor, StackHawk: StackHawk and FOSSA are getting together Thursday, April 8, to show you how to automate AppSec testing with GitHub actions. Register to
SRE Weekly Issue #263
Monday, March 29, 2021
View on sreweekly.com A message from our sponsor, StackHawk: You can utilize Swagger Docs in security testing to drive more thorough and accurate vulnerability scans of your APIs. Learn how: http://
SRE Weekly Issue #262
Monday, March 22, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join the Secure Coding Summit to hear from industry-leading AppSec and DevSecOps practitioners, analysts, and visionaries as they share
SRE Weekly Issue #261
Monday, March 15, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join Snyk and StackHawk on March 18 as they walk through how to use Software Composition Analysis (SCA) and Dynamic Application Security
You Might Also Like
Youre Overthinking It
Wednesday, January 15, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, January 15, 2025? The
eBook: Software Supply Chain Security for Dummies
Wednesday, January 15, 2025
Free access to this go-to-guide for invaluable insights and practical advice to secure your software supply chain. The Hacker News Software Supply Chain Security for Dummies There is no longer doubt
The 5 biggest AI prompting mistakes
Wednesday, January 15, 2025
✨ Better Pixel photos; How to quit Meta; The next TikTok? -- ZDNET ZDNET Tech Today - US January 15, 2025 ai-prompting-mistakes The five biggest mistakes people make when prompting an AI Ready to
An interactive tour of Go 1.24
Wednesday, January 15, 2025
Plus generating random art, sending emails, and a variety of gopher images you can use. | #538 — January 15, 2025 Unsub | Web Version Together with Posthog Go Weekly An Interactive Tour of Go 1.24 — A
Spyglass Dispatch: Bromo Sapiens
Wednesday, January 15, 2025
Masculine Startups • The Fall of Xbox • Meta's Misinformation Off Switch • TikTok's Switch Off The Spyglass Dispatch is a newsletter sent on weekdays featuring links and commentary on timely
The $1.9M client
Wednesday, January 15, 2025
Money matters, but this invisible currency matters more. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
⚙️ Federal data centers
Wednesday, January 15, 2025
Plus: Britain's AI roadmap
Post from Syncfusion Blogs on 01/15/2025
Wednesday, January 15, 2025
New blogs from Syncfusion Introducing the New .NET MAUI Bottom Sheet Control By Naveenkumar Sanjeevirayan This blog explains the features of the Bottom Sheet control introduced in the Syncfusion .NET
The Sequence Engineering #469: Llama.cpp is The Framework for High Performce LLM Inference
Wednesday, January 15, 2025
One of the most popular inference framework for LLM apps that care about performance. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
3 Actively Exploited Zero-Day Flaws Patched in Microsoft's Latest Security Update
Wednesday, January 15, 2025
THN Daily Updates Newsletter cover The Kubernetes Book: Navigate the world of Kubernetes with expertise , Second Edition ($39.99 Value) FREE for a Limited Time Containers transformed how we package and