SRE Weekly - SRE Weekly Issue #231
I have a special treat for you this week: 7 detailed incident reports! Just a note, I’ll be on vacation next week, so I’ll see you in two weeks on August 23.
Articles
The lead SRE at Under Armour(!) has a ton of interesting things to share about how they do SRE. I love their approach to incident retrospectives that starts with 1:1 interviews with those involved.
Paul Osman — Under Armour (Blameless Summit)
A routine infrastructure maintenance had unintended consequences, saturating MySQL with excessive connections.
Daniel Messer — RedHat
This report details the complex factors that contributed to the failure of a dam in Michigan in May of this year.
Jason Hayes — Mackinac Center for Public Policy
This incident involved a DNS failure in Heroku’s infrastrucure provider (presumably AWS).
Heroku
This incident at LinkedIn impacted multiple internal customers with varying requirements for durability and latency, making recovery complex.
Sandhya Ramu and Vasanth Rajamani — LinkedIn
This report includes a description of an incident involving Kubernetes pods and an impaired DNS service.
Keith Ballinger — GitHub
In this report, Honeycomb describes how they investigated an incident from the prior week that their monitoring had missed.
Martin Holman — Honeycomb
Outages
- Discord
- This one is notable because it involves a purported “noisy neighbor” situation in Google Cloud Platform.
- Slack
- Canon
- Steam
- Some sites loading slowly
- Indeed
- Fastly
|
Older messages
SRE Weekly Issue #232
Tuesday, August 25, 2020
View on sreweekly.com A message from our sponsor, StackHawk: Is your company adopting GraphQL? Adding security testing is simple. Watch this 20 minute walk through to see how easy it is to get up and
SRE Weekly Issue #230
Monday, August 3, 2020
View on sreweekly.com Happy BTW: Wear a mask. A message from our sponsor, StackHawk: Add security testing to your CI pipelines with GitHub Actions. Check out this webinar recording (no email required)
SRE Weekly Issue #229
Monday, July 27, 2020
View on sreweekly.com A message from our sponsor, StackHawk: Read about how to build test driven security with StackHawk + Travis CI + Docker Compose. https://www.stackhawk.com/blog/test-driven-
SRE Weekly Issue #228
Monday, July 20, 2020
View on sreweekly.com SRE From Home is back! It's happening this Thursday, and I'll be on the Ask an SRE panel answering your questions. And don't miss the talks by lots of great folks,
SRE Weekly Issue #227
Monday, July 13, 2020
View on sreweekly.com A message from our sponsor, StackHawk: When a team introduces security bugs, they don't know because nothing tells them. We test for everything else… why not security bugs?
You Might Also Like
Youre Overthinking It
Wednesday, January 15, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, January 15, 2025? The
eBook: Software Supply Chain Security for Dummies
Wednesday, January 15, 2025
Free access to this go-to-guide for invaluable insights and practical advice to secure your software supply chain. The Hacker News Software Supply Chain Security for Dummies There is no longer doubt
The 5 biggest AI prompting mistakes
Wednesday, January 15, 2025
✨ Better Pixel photos; How to quit Meta; The next TikTok? -- ZDNET ZDNET Tech Today - US January 15, 2025 ai-prompting-mistakes The five biggest mistakes people make when prompting an AI Ready to
An interactive tour of Go 1.24
Wednesday, January 15, 2025
Plus generating random art, sending emails, and a variety of gopher images you can use. | #538 — January 15, 2025 Unsub | Web Version Together with Posthog Go Weekly An Interactive Tour of Go 1.24 — A
Spyglass Dispatch: Bromo Sapiens
Wednesday, January 15, 2025
Masculine Startups • The Fall of Xbox • Meta's Misinformation Off Switch • TikTok's Switch Off The Spyglass Dispatch is a newsletter sent on weekdays featuring links and commentary on timely
The $1.9M client
Wednesday, January 15, 2025
Money matters, but this invisible currency matters more. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
⚙️ Federal data centers
Wednesday, January 15, 2025
Plus: Britain's AI roadmap
Post from Syncfusion Blogs on 01/15/2025
Wednesday, January 15, 2025
New blogs from Syncfusion Introducing the New .NET MAUI Bottom Sheet Control By Naveenkumar Sanjeevirayan This blog explains the features of the Bottom Sheet control introduced in the Syncfusion .NET
The Sequence Engineering #469: Llama.cpp is The Framework for High Performce LLM Inference
Wednesday, January 15, 2025
One of the most popular inference framework for LLM apps that care about performance. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
3 Actively Exploited Zero-Day Flaws Patched in Microsoft's Latest Security Update
Wednesday, January 15, 2025
THN Daily Updates Newsletter cover The Kubernetes Book: Navigate the world of Kubernetes with expertise , Second Edition ($39.99 Value) FREE for a Limited Time Containers transformed how we package and