SRE Weekly - SRE Weekly Issue #341
Articles
My coworkers referred to a system “going metastable”, and when I asked what that was, they pointed me to this awesome paper.
Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is `removed.
  Nathan Bronson, Aleksey Charapko, Abutalib Aghayev, and Timothy Zhu
Honeycomb posted this incident report involving a service hitting the open file descriptors limit.
  Honeycomb
  Full disclosure: Honeycomb is my employer.
Lots of interesting answers to this one, especially when someone uttered the phrase:
engineers should not be on call
u/infomaniac89 and others — reddit
A misbehaving internal Google service overloaded Cloud Filestore, exceeding its global request limit and effectively DoSing customers.
An in-depth look at how Adobe improved its on-call experience. They used a deliberate plan to change their team’s on-call habits for the better.
Bianca Costache — Adobe
This one contains an interesting observation: they found that outages caused by a cloud providers take longer to solve.
Jeff Martens — Metrist
Even if you don’t agree with all of their reasons, it’s definitely worth thinking about.
Danny Martinez — incident.io
This one covers common reliability risks in APIs and techniques for mitigating them.
Utsav Shah
The evolution beyond separate Dev and Ops teams continues. This article traces the path through DevOps and into platform-focused teams.
  Charity Majors — Honeycomb
  Full disclosure: Honeycomb is my employer.
|
Older messages
SRE Weekly Issue #340
Monday, September 26, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms
SRE Weekly Issue #339
Monday, September 19, 2022
View on sreweekly.com It's with great sadness that I note the passing of a giant in our field, Dr. Richard Cook. His memory will live on through his huge body of work and the countless ways
SRE Weekly Issue #338
Monday, September 12, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms
SRE Weekly Issue #337
Monday, September 5, 2022
View on sreweekly.com Thanks for all the vacation well-wishes! It was really great and relaxing. Take vacations, it's important for reliability! While I was out, I shipped the past two issues with
SRE Weekly Issue #336
Monday, August 29, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
You Might Also Like
Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow
Monday, November 25, 2024
Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the
What Investors Want From AI Startups in 2025
Monday, November 25, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon
GCP Newsletter #426
Monday, November 25, 2024
Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's
⏳ 36 Hours Left: Help Get "The Art of Data" Across the Finish Line 🏁
Monday, November 25, 2024
Visual Capitalist plans to unveal its secrets behind data storytelling, but only if the book hits its minimum funding goal. View Online | Subscribe | Download Our App We Need Your Help Only 36 Hours
DeveloPassion's Newsletter #180 - Black Friday Week
Monday, November 25, 2024
Edition 180 of my newsletter, discussing Knowledge Management, Knowledge Work, Zen Productivity, Personal Organization, and more! Sébastien Dubois DeveloPassion's Newsletter DeveloPassion's
Meet HackerNoon's Latest Features: Boost Stories with Translations, Speech-to-Text & More
Monday, November 25, 2024
Hey, Hacker! HackerNoon's monthly product update is here! Get ready for a new version of the mobile app, more translation developments, a new AI Gallery, backend moves, and more! 🚀 This product
The ultimate holiday gadget gift
Monday, November 25, 2024
AI isn't hitting a wall; $70 off Apple Watch; 60+ Amazon deals -- ZDNET ZDNET Tech Today - US November 25, 2024 Meta Quest 3S Why the Meta Quest 3S is the ultimate 2024 holiday present This $299
Deduplication in Distributed Systems: Myths, Realities, and Practical Solutions
Monday, November 25, 2024
This week, we'll discuss the deduplication strategies. We'll see whether they're useful and consider scenarios where you may need them. We'll also do a reality check with the promises
How to know if your data has been exposed
Monday, November 25, 2024
How do you know if your personal data has been leaked? Imagine getting an instant notification if your SSN, credit card, or password has been exposed on the dark web — so you can take action
⚙️ Amazon and Anthropic
Monday, November 25, 2024
Plus: The hidden market of body-centric data