SRE Weekly - SRE Weekly Issue #469
I've shared this article before, but it's so critical that it's time to give it another read. MTTR is a statistically useless metric, and by using it, we will draw faulty conclusions and potentially take harmful actions. Courtney Nash does a really great job of laying out the science in an easy-to-understand way.
Courtney Nash — Resilience in Software Foundation / The VOID
I like the analogy here: when we say people are components in or sociotechnical systems, system diagrams are like a form of cache.
Clint Byrum
From Werner Vogels's intro to this article:
Andy takes us through S3’s evolution from simple object store to sophisticated data platform, illustrating how customer feedback has shaped every aspect of the service. It’s a fascinating look at how we maintain simplicity even as systems scale to handle hundreds of trillions of objects.
Andy Warfield — Amazon
Instead of a traditional Cost/Performance/Reliability trade-off, this article argues that serverless presents a tradeoff of Cost, Performance, and Complexity.
Luc van Donkersgoed
Google uses System Theoretic Process Analysis to identify problems in their systems. They found that the most effective way to spread adoption of STPA was to build their own training program.
Garrett Holthaus — Google
So far, I'm liking this new post series from Nextdoor about their efforts to scale their datastore. Here's the first installment, about the things they've tried up to now.
I'll share the rest of the series as I work my way through them.
Slava Markeyev — Nextdoor
Wow, I had no idea EBS volumes failed this often!
Nick Van Wiggeren — PlanetScale
|
Older messages
SRE Weekly Issue #468
Tuesday, March 18, 2025
View on sreweekly.com A message from our sponsor, incident.io: MTTx metrics fall short—learn the new industry benchmarks for measuring and improving incident management. Join us on Tuesday, March 18th
SRE Weekly Issue #467
Monday, March 10, 2025
View on sreweekly.com A message from our sponsor, incident.io: SEV0 is back. This fall, we're bringing together the best minds in incident management for a day of learning, sharing, and networking
SRE Weekly Issue #466
Monday, March 3, 2025
View on sreweekly.com A bit of a short issue this week, as I spent most of my weekend at my child's first First Robotics Competition of the season. FRC truly is a microcosm of reliability
SRE Weekly Issue #464
Thursday, February 27, 2025
View on sreweekly.com A message from our sponsor, incident.io: For years, on-call has felt more like a burden than a solution. But modern teams are making a change. On Feb 26 at 1 PM EST, hear why—and
SRE Weekly Issue #465
Thursday, February 27, 2025
View on sreweekly.com A message from our sponsor, incident.io: On-call shouldn't be a constant source of stress. On Feb 26 at 1 PM EST, join us to hear from teams who've moved from PagerDuty to
You Might Also Like
BetterDev #277 - When You Deleted /lib on Linux While Still Connected via SSH
Tuesday, March 25, 2025
Better Dev #277 Mar 25, 2025 Hi all, Last week, NextJS has a new security vulnerability, CVE-2025-29927 that allow by pass middleware auth checking by setting a header to trick it into thinking this is
JSK Daily for Mar 25, 2025
Tuesday, March 25, 2025
JSK Daily for Mar 25, 2025 View this email in your browser A community curated daily e-mail of JavaScript news Easily Render Flat JSON Data in JavaScript File Manager The Syncfusion JavaScript File
Want to create an AI Agent?
Tuesday, March 25, 2025
Tell me what to build next ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
LangGraph, Marimo, Django Template Components, and More
Tuesday, March 25, 2025
LangGraph: Build Stateful AI Agents in Python #674 – MARCH 25, 2025 VIEW IN BROWSER The PyCoder's Weekly Logo LangGraph: Build Stateful AI Agents in Python LangGraph is a versatile Python library
Charted | Where People Trust the Media (and Where They Don't) 🧠
Tuesday, March 25, 2025
Examine the global landscape of public trust in media institutions. Confidence remains low in all but a few key countries. View Online | Subscribe | Download Our App Presented by: BHP >> Read
Daily Coding Problem: Problem #1728 [Medium]
Tuesday, March 25, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Square. Assume you have access to a function toss_biased() which returns 0 or 1 with a
LW 175 - Shopify uses AI to Prepare Stores for Script Editor Deprecation
Tuesday, March 25, 2025
Shopify uses AI to Prepare Stores for Script Editor Deprecation Shopify Development news and
Reminder: Microservices rules #7: Design loosely design-time coupled services - part 1
Tuesday, March 25, 2025
You are receiving this email because you subscribed to microservices.io. Considering migrating a monolith to microservices? Struggling with the microservice architecture? I can help: architecture
Delete your 23andMe data ASAP 🧬
Tuesday, March 25, 2025
95+ Amazon tech deals; 10 devs on vibe coding pros and cons -- ZDNET ZDNET Tech Today - US March 25, 2025 dnacodegettyimages-155360625 How to delete your 23andMe data and why you should do it now With
Post from Syncfusion Blogs on 03/25/2025
Tuesday, March 25, 2025
New blogs from Syncfusion ® Create AI-Powered Smart .NET MAUI Data Forms for Effortless Data Collection By Jeyasri Murugan This blog explains how to create an AI-powered smart data form using our .NET