SRE Weekly - SRE Weekly Issue #455

View on sreweekly.com

A message from our sponsor, FireHydrant:

FireHydrant Retrospectives are now more customizable and collaborative than ever with custom templates, AI-generated answers, collaborative editing... all exportable to Google Docs and Confluence. See how our retros can save you 2+ hours on every incident.

https://firehydrant.com/blog/welcome-to-your-new-retrospective-experience-more-customizable-collaborative/

This article has 6 methods to mitigate thundering herd problems, including pretty diagrams with each.

  Sid

Some thoughts on the "second victim" concept. As a note, I was one of the participants in the discussion on which this article is based.

  Fractal Flame

Written in response to a question about the big CrowdStrike outage earlier this year, this article asks: do we need to start using safer languages?

  Kode Vicious — ACM Queue

This one used a cool technique I haven't seen yet: they hardcoded a cutoff time into the old and new systems, so they both automatically cut over simultaneously.

   Md Riyadh, Jia Long Loh, Muqi Li, and Pu Li — Grab

Here's a great writeup of a problem with the UK flight system involving a latent bug. Among several cool takeaways, I really liked the way the official incident report didn't try to pretend this weird bug could have been foreseen and prevented.

  Chris Evans — incident.io

This game day ended up way more serious than intended and exposed a serious Kubernetes configuration flaw, causing a real outage. Oops!

  Lawrence Jones

It's all fun and games until someone accidentally uses too much DTAZ (data transfer between availability zones). Good monitoring story, too!

  Grzegorz Skołyszewski — Prezi

OpenAI posted this writeup of an incident earlier this week. They tried to deploy detailed monitoring for their Kubernetes cluster, but the monitoring system overloaded the Kubernetes API.

  OpenAI

And here's Lorin Hochstein's analysis of OpenAI's incident writeup, including a recurring theme:

This is a great example of unexpected behavior of a subsystem whose primary purpose was to improve reliability.

  Lorin Hochstein







This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
SRE Weekly, a production of Tinker Tinker Tinker, LLC · PO Box 253 · South Lancaster, MA 01561-0253 · USA

Older messages

SRE Weekly Issue #454

Tuesday, December 10, 2024

View on sreweekly.com Nine entire years ago, I threw together a few "issues" with my favorite SRE articles, installed Wordpress, and added a subscription form, with no clue what I was doing.

SRE Weekly Issue #453

Monday, December 2, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Why migrate from PagerDuty? Empower team-level ownership, reduce costs, decouple alerts from incidents, automate incidents end-to-end...to

SRE Weekly Issue #452

Monday, November 25, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-

SRE Weekly Issue #451

Monday, November 18, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-

SRE Weekly Issue #450

Monday, November 11, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-

You Might Also Like

GCP Newsletter #433

Monday, January 13, 2025

Welcome to issue #433 January 13th, 2025 News Official Blog Vertex AI Introducing Vertex AI RAG Engine: Scale your Vertex AI RAG pipeline with confidence - Vertex AI RAG Engine is a fully managed

Spyglass Dispatch: It's Political & Personal

Monday, January 13, 2025

On Meta's Moderation Changes • Inside DOGE • Zuck Slams Apple (Again) • Apple's Muted 2025 • CES 2025 Recap The Spyglass Dispatch is a newsletter sent on weekdays featuring links and commentary

$200 to invest today... (USA Only)

Monday, January 13, 2025

Join me in investing in blue chip art on Masterworks, and you will receive $200 to invest on the platform. Not kidding. Founder interview coming soon! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

The Sequence Knowledge #468: A New Series About RAG

Monday, January 13, 2025

Exploring key concepts of one of the most popular methods in generative AI solutions. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

How a Kafka-Like Producer Writes to Disk

Monday, January 13, 2025

We take a Kafka client, call the producer, send the message, and boom, expect it to be delivered on the other end. And that's actually how it goes. But wouldn't it be nice to understand better

FAQs: The AI Consultancy Project

Monday, January 13, 2025

This is how we'll help you become an AI Consultant ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

⚡ THN Weekly Recap: Top Cybersecurity Threats, Tools and Tips [13 Jan]

Monday, January 13, 2025

Your one-stop-source for last week's top cybersecurity headlines. The Hacker News Cybersecurity Recap The cyber world's been buzzing this week, and it's all about staying ahead of the bad

My 3 must-buy CES 2025 gadgets

Monday, January 13, 2025

Alarming iPhone bug; Router-based VPN; 90-second vision test -- ZDNET ZDNET Tech Today - US January 13, 2025 Mcon magsafe phone controller Three CES 2025 products I'd buy as soon as they'd take

⚙️ Meta's copyright struggles

Monday, January 13, 2025

Plus: Achieving data center efficiency ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Post from Syncfusion Blogs on 01/13/2025

Monday, January 13, 2025

New blogs from Syncfusion Top 5 React Chart Libraries for 2025 By Gowrimathi S Explore the top 5 React chart libraries with a comparison of their features, pros, and cons to boost your data