SRE Weekly - SRE Weekly Issue #457

View on sreweekly.com

A message from our sponsor, FireHydrant:

This New Year, resolve to make incident management smarter, faster, and way less stressful with FireHydrant. Modern on-call, automated incident response, and AI tools that do the heavy lifting.

https://firehydrant.com/

In this post, we’ll explore the reasons that OOM kills can occur and provide tactics to combat and prevent them.

  Will Searle — Causely

The high-plateau of basic resilience is the third interim stop, companies tend to reach on their journey towards resilience.

I especially enjoyed the bit about how trying to add robustness can paradoxically diminish overall reliability, reminiscent of Lorin Hochstein and others.

  Uwe Friedrichsen

What happens when you move your DB and network latency goes from 0.5ms to 10ms? Time to find out by experimenting (carefully).

  Lawrence Jones

I've only used Kubernetes under Amazon EKS, which handles running etcd, so this guide helped fill in some gaps in my knowledge. Of course, under EKS, you still need to pay attention to etcd.

  David M. Lentz — Datadog

Google folks share how they've applied System-Theoretic Accident Model and Processes (STAMP) to SRE at Google. This really stood out to me:

A design might implement its requirements flawlessly. But what if requirements necessary for the system to be safe were incorrect or, even worse, missing altogether? 

  Tim Falzone and Ben Treynor Sloss — USENIX ;login:

Search and rescue (SAR) operations and incident response have striking similarities. In this series, Claire dives into lessons SREs can learn from wildfire management ICSs.

I really love learning about ICS from the veterans who use it for actual emergencies!

  Claire Leverne — Rootly

Runbooks are programs for an imperfect execution engine of highly variable quality.

What happens when the runbook meets reality?

  Jos Visser

This is a really great one! Several factors combined to cause the outage, and they're all laid out in juicy detail.

  Brendan Humphreys — Canva

Here's Lorin Hochstein's take on Canva's outage report.

  Lorin Hochstein







This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
SRE Weekly, a production of Tinker Tinker Tinker, LLC · PO Box 253 · South Lancaster, MA 01561-0253 · USA

Older messages

SRE Weekly Issue #456

Monday, December 23, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: On-call during the holidays? Spend more time taking in some R&R and less getting paged. Let alerts make their rounds fairly with our

SRE Weekly Issue #455

Thursday, December 19, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant Retrospectives are now more customizable and collaborative than ever with custom templates, AI-generated answers,

SRE Weekly Issue #454

Tuesday, December 10, 2024

View on sreweekly.com Nine entire years ago, I threw together a few "issues" with my favorite SRE articles, installed Wordpress, and added a subscription form, with no clue what I was doing.

SRE Weekly Issue #453

Monday, December 2, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Why migrate from PagerDuty? Empower team-level ownership, reduce costs, decouple alerts from incidents, automate incidents end-to-end...to

SRE Weekly Issue #452

Monday, November 25, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-

You Might Also Like

Re: This took me 10 minutes and protects my privacy

Thursday, January 2, 2025

My New Year's resolution is to do a better job of protecting my identity online. Last year, billions of personal records were compromised due to data breaches. That's why I wanted to tell you

Edge 462: What is Fast-LLM. The New Popular Framework for Pretraining your Own LLMs

Thursday, January 2, 2025

Created by ServiceNow, the framework provides the key building blocks for pretraining AI models. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

The Evolution of Android Architecture Patterns

Thursday, January 2, 2025

View in browser 🔖 Articles The Evolution of Android Architecture Patterns As we welcome the New Year, dive into the transformative journey of Android architecture patterns. From MVC to MVI, explore how

🤖 Here’s How Robots are Farming Your Food — My 5 Video Game Resolutions for 2025

Wednesday, January 1, 2025

Also: Facebook Is Too Good at Suggesting Ads, and More! How-To Geek Logo January 1, 2025 Did You Know After the 1982 film ET: The Extra-Terrestrial featured Reese's Pieces prominently as a treat

Daily Coding Problem: Problem #1655 [Medium]

Wednesday, January 1, 2025

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. What will this code print out? def make_functions(): flist = [] for i in [1, 2,

Ranked | The Most Popular Visualizations on Voronoi in 2024 🏆

Wednesday, January 1, 2025

We round up the most popular, most discussed, and most liked visualizations of 2024 on Voronoi, our data storytelling platform. View Online | Subscribe In 2024, there were over 3000 visualizations

Top Tech Deals 👀 Cheap TVs, Gaming Headphones, Apple Watch, and More!

Wednesday, January 1, 2025

Score an indoor smart camera, soundbar, and more at a big discount. How-To Geek Logo January 1, 2025 Top Tech Deals: Cheap TVs, Gaming Headphones, Apple Watch, and More! Score an indoor smart camera,

New U.S. DoJ Rule Halts Bulk Data Transfers to Adversarial Nations to Protect Privacy

Wednesday, January 1, 2025

THN Daily Updates Newsletter cover Full Stack Web Development ($54.99 Value) FREE for a Limited Time This book offers a comprehensive guide to full stack web development, covering everything from core

Detecting Parasites 🪲

Wednesday, January 1, 2025

A newsletter publisher takes on parasite SEO. Here's a version for your browser. Hunting for the end of the long tail • December 31, 2024 Detecting Parasites Our year-end award for best blog post

Final Chance to Claim Your Bonus Gift 🎁

Tuesday, December 31, 2024

Just sign up to VC+ by January 1st and we'll include a free gift. View email in browser OFFER ENDS JANUARY 1ST Last Chance to Claim Your Free Gift! The Global Forecast Series, presented by Inigo,