SRE Weekly - SRE Weekly Issue #233
Articles
In this post, I’ll share how we ensured that Meet’s available service capacity was ahead of its 30x COVID-19 usage growth, and how we made that growth technically and operationally sustainable by leveraging a number of site reliability engineering (SRE) best practices.
Samantha Schaevitz — Google
I love the concept of “battleshorts” just as much as I’ve been enjoying this series of articles analyzing STAMP.
Lorin Hochstein
Honeycomb had 5 incidents in just over a week, prompting not only their normal incident investigation process, but a meta-analysis of all five together.
Emily Nakashima — Honeycomb
Why is Chromium responsible for half of the DNS queries to the root nameservers? And why do they all return NXDOMAIN?
Matthew Thomas — APNIC
“That Moment” when your fire suppression system triggers and the fire department shows up. This is part war story and part description of incident response practices.
Ariel Pisetzky — Taboola
An overload in an internal blob storage system impacted many dependent services.
Sharding as a service, now there’s an interesting idea.
Gerald Guo, Thawan Kooburat — Facebook
In Kubernetes Operators: Automating the Container Orchestration Platform, authors Jason Dobies and Joshua Wood describe an Operator as “an automated Site Reliability Engineer for its application.” Given an SRE’s multifaceted experience and diverse workload, this is a bold statement. So what exactly can the Operator do?
Emily Arnot — Blameless
Outages
- Zoom
- Slack
- Let’s Encrypt
- NZX (New Zealand Stock Exchange)
- eBay
- Garmin
- Heroku
- Fastly
- Also this one.
Full disclosure: Fastly is my employer.
- Also this one.
- Cloudflare
|
Older messages
SRE Weekly Issue #231
Tuesday, August 25, 2020
View on sreweekly.com I have a special treat for you this week: 7 detailed incident reports! Just a note, I'll be on vacation next week, so I'll see you in two weeks on August 23. A message
SRE Weekly Issue #232
Tuesday, August 25, 2020
View on sreweekly.com A message from our sponsor, StackHawk: Is your company adopting GraphQL? Adding security testing is simple. Watch this 20 minute walk through to see how easy it is to get up and
SRE Weekly Issue #230
Monday, August 3, 2020
View on sreweekly.com Happy BTW: Wear a mask. A message from our sponsor, StackHawk: Add security testing to your CI pipelines with GitHub Actions. Check out this webinar recording (no email required)
SRE Weekly Issue #229
Monday, July 27, 2020
View on sreweekly.com A message from our sponsor, StackHawk: Read about how to build test driven security with StackHawk + Travis CI + Docker Compose. https://www.stackhawk.com/blog/test-driven-
SRE Weekly Issue #228
Monday, July 20, 2020
View on sreweekly.com SRE From Home is back! It's happening this Thursday, and I'll be on the Ask an SRE panel answering your questions. And don't miss the talks by lots of great folks,
You Might Also Like
Youre Overthinking It
Wednesday, January 15, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, January 15, 2025? The
eBook: Software Supply Chain Security for Dummies
Wednesday, January 15, 2025
Free access to this go-to-guide for invaluable insights and practical advice to secure your software supply chain. The Hacker News Software Supply Chain Security for Dummies There is no longer doubt
The 5 biggest AI prompting mistakes
Wednesday, January 15, 2025
✨ Better Pixel photos; How to quit Meta; The next TikTok? -- ZDNET ZDNET Tech Today - US January 15, 2025 ai-prompting-mistakes The five biggest mistakes people make when prompting an AI Ready to
An interactive tour of Go 1.24
Wednesday, January 15, 2025
Plus generating random art, sending emails, and a variety of gopher images you can use. | #538 — January 15, 2025 Unsub | Web Version Together with Posthog Go Weekly An Interactive Tour of Go 1.24 — A
Spyglass Dispatch: Bromo Sapiens
Wednesday, January 15, 2025
Masculine Startups • The Fall of Xbox • Meta's Misinformation Off Switch • TikTok's Switch Off The Spyglass Dispatch is a newsletter sent on weekdays featuring links and commentary on timely
The $1.9M client
Wednesday, January 15, 2025
Money matters, but this invisible currency matters more. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
⚙️ Federal data centers
Wednesday, January 15, 2025
Plus: Britain's AI roadmap
Post from Syncfusion Blogs on 01/15/2025
Wednesday, January 15, 2025
New blogs from Syncfusion Introducing the New .NET MAUI Bottom Sheet Control By Naveenkumar Sanjeevirayan This blog explains the features of the Bottom Sheet control introduced in the Syncfusion .NET
The Sequence Engineering #469: Llama.cpp is The Framework for High Performce LLM Inference
Wednesday, January 15, 2025
One of the most popular inference framework for LLM apps that care about performance. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
3 Actively Exploited Zero-Day Flaws Patched in Microsoft's Latest Security Update
Wednesday, January 15, 2025
THN Daily Updates Newsletter cover The Kubernetes Book: Navigate the world of Kubernetes with expertise , Second Edition ($39.99 Value) FREE for a Limited Time Containers transformed how we package and