SRE Weekly - SRE Weekly Issue #399

View on sreweekly.com

A message from our sponsor, FireHydrant:

Severity levels help responders and stakeholders understand the incident impact and set expectations for the level of response. This can mean jumping into action faster. But first, you have to ensure severity is actually being set. Here’s one way.
https://firehydrant.com/blog/incident-severity-why-you-need-it-and-how-to-ensure-its-set/

This research paper summary goes into Mode Error and the dangers of adding more features to a system in the form of modes, especially if the system can change modes on its own.

  Fred Hebert (summary)
  Dr. Nadine B. Sarter (original paper)

Cloudflare suffered a power outage in one of the datacenters housing their control and data planes. The outage itself is intriguing, and in its aftermath, Cloudflare learned that their system wasn’t as HA as they thought.

Lots of great lessons here, and if you want more, they posted another incident writeup recently.

   Matthew Prince — Cloudflare

Separating write from read workloads can increase complexity but also open the door to greater scalability, as this article explains.

  Pier-Jean Malandrino

Covers four strategies for load shedding, with code examples:

  • Random Shedding
  • Priority-Based Shedding
  • Resource-Based Shedding
  • Node Isolation

  Code Reliant

Lots of juicy details about the three outages, including a link to AWS’s write-up of their Lambda outage in June.

  Gergely Orosz

The diagrams in this article are especially useful for understanding how the circuit-breaker pattern works.

  Pier-Jean Malandrino

This one’s about how on-call can go bad, and how to structure your team’s on-call so to be livable and sustainable.

  Michael Hart

Execs cast a big shadow in an incident, so it’s important to have a plan for how to communicate with them, as this article explains.

  Ashley Sawatsky — Rootly







This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
SRE Weekly, a production of Tinker Tinker Tinker, LLC · PO Box 253 · South Lancaster, MA 01561-0253 · USA

Older messages

SRE Weekly Issue #398

Monday, November 13, 2023

View on sreweekly.com A message from our sponsor, FireHydrant: “Change is the essential process of all existence.” – Spock It's time for alerting to evolve. Get a first look at how incident

[SRE Weekly] I'll be at KubeCon North America

Monday, November 6, 2023

Hi folks, sorry for invading your inbox / RSS feed an extra time this week! I forgot to mention with yesterday's issue that I'll be at KubeCon this week. Hit me up for some SRE Weekly swag (

SRE Weekly Issue #397

Monday, November 6, 2023

View on sreweekly.com A message from our sponsor, FireHydrant: Incident management platform FireHydrant is combining alerting and incident response in one ring-to-retro tool. Sign up for the early

SRE Weekly Issue #396

Monday, October 30, 2023

View on sreweekly.com A message from our sponsor, FireHydrant: DevOps keeps evolving but alerting tools are stuck in the past. Any modern alerting tool should be built on these four principles: cost-

SRE Weekly Issue #395

Monday, October 23, 2023

View on sreweekly.com A message from our sponsor, FireHydrant: Incident management platform FireHydrant is combining alerting and incident response in one ring-to-retro tool. Sign up for the early

You Might Also Like

Is there more to your iPhone?

Monday, November 25, 2024

Have you ever wondered if there's more to your iPhone than meets the eye? Maybe you've been using it for years, but certain powerful features and settings remain hidden. That's why we'

🎉 Black Friday Early Access: 50% OFF

Monday, November 25, 2024

Black Friday discount is now live! Do you want to master Clean Architecture? Only this week, access the 50% Black Friday discount. ​ Here's what's inside: 7+ hours of lessons .NET Aspire coming

Open Pull Request #59

Monday, November 25, 2024

LightRAG, anything-llm, llm, transformers.js and an Intro to monads for software devs ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Last chance to register: SecOps made smarter

Monday, November 25, 2024

Don't miss this opportunity to learn how gen AI can transform your security workflowsㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤ elastic | Search. Observe. Protect

SRE Weekly Issue #452

Monday, November 25, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-

Corporate Casserole 🥘

Monday, November 25, 2024

How marketing and lobbying inspired Thanksgiving traditions. Here's a version for your browser. Hunting for the end of the long tail • November 24, 2024 Hey all, Ernie here with a classic

WP Weekly 221 - Bluesky - WP Assets on CDN, Limit Font Subsets, ACF Pro Now

Monday, November 25, 2024

Read on Website WP Weekly 221 / Bluesky Have you joined Bluesky, like many other WordPress users, a new place for an online social presence? Also in this issue: CrawlWP, Asset Management Framework,

🤳🏻 We Need More High-End Small Phones — Linux Terminal Setup Tips

Sunday, November 24, 2024

Also: Why I Switched From Google Maps to Apple Maps, and More! How-To Geek Logo November 24, 2024 Did You Know Medieval moats didn't just protect castles from invaders approaching over land, but

JSK Daily for Nov 24, 2024

Sunday, November 24, 2024

JSK Daily for Nov 24, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted

OpenAI's turbulent early years - Sync #494

Sunday, November 24, 2024

Plus: Anthropic and xAI raise billions of dollars; can a fluffy robot replace a living pet; Chinese reasoning model DeepSeek R1; robot-dog runs full marathon; a $12000 surgery to change eye colour ͏ ͏