SRE Weekly - SRE Weekly Issue #321

View on sreweekly.com

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set):
https://rootly.com/demo/

Articles

A researcher explains how they implemented their microservice failure testing tool at DoorDash. The tool, Fillibuster, automatically discovers microservice dependencies and injects faults, avoiding the need to design specific individual failure scenarios.

  Christopher Meiklejohn — DoorDash

Last week, I shared Atlassian’s outage write-up. This link is a Twitter thread with a critique.

I feel like it is perhaps not a “good look” to repeatedly try to sell your product in your writeup about your product’s catastrophic outage

  @ReinH

“Error” serves a number of functions for an organization: as a defense against entanglement, the illusion of control, as a means for distancing, and as a marker for a failed investigation.

  Eric Dobbs

This is a write-up posted in January for an incident that occurred during an infrastructure migration. I feel like I can relate to every one of the learnings.

  Enom (Tucows)

In the past two years, I’ve been participating in on-call rotations as a Site Reliability Engineer at Vinted. Here are some of the practical lessons I’ve learned about the process.

  Ernestas Narmontas

This article is all about finding out what risks exist that may impact your ability to meet your SLOs. Once you’ve done that, you can determine whether your SLOs are realistic.

  Ayelet Sachto — Google

When your organization chooses to implement SLOs, how do you get everyone on board? This two-part series has an in-depth look at how Klarna did it.

  Andrew Cartine — Klarna

Subtitle: And why do SRE teams need PMs?

After laying out the reasons why SREs need PMs, this article goes into detail about what a PM can bring to an SRE team.

  António Araújo — detech.ai

BellJar helps users find cyclic dependencies in their services, by running totally isolated VMs and requiring users to explicitly enable every external dependency they need in order to bootstrap each service. It has a really neat feature of automatically generating runbooks based on these test cases.

  Christopher Bunn and Jie Huang — Meta

This week, I watched Netflix’s Meltdown: Three Mile Island, a documentary about the nuclear accident in the US in 1979. It’s not exactly a post-incident write-up, but there’s a lot in there about normalization of deviance, situational awareness, and risk-taking (both in and out of incidents).

  Netflix

Outages

  • Slack
  • Heroku
    • Heroku’s been dealing with a security incident since April 13. They performed a mass password reset of all accounts and their GitHub integration has been disabled for days.

  • Roblox






This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
SRE Weekly · PO Box 253 · South Lancaster, MA 01561-0253 · USA

Older messages

SRE Weekly Issue #320

Monday, May 2, 2022

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and

SRE Weekly Issue #319

Monday, April 25, 2022

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and

SRE Weekly Issue #318

Monday, April 18, 2022

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and

SRE Weekly Issue #317

Monday, April 11, 2022

View on sreweekly.com Bit of a short issue this week, as I'm currently recovering from COVID-19. Please don't worry! I seem to have a very minor case, likely thanks in large part to vaccination

SRE Weekly Issue #316

Monday, April 4, 2022

View on sreweekly.com I'm on vacation, so I prepared this issue in advance. Practically speaking, that just means there's no Outages section this week. See you all next week! PS Okay, I know I

You Might Also Like

Software Testing Weekly - Issue 253

Monday, January 13, 2025

Software Testing Weekly turns 5! 🥳 View on the Web Archives ISSUE 253 January 13th 2025 COMMENT Welcome to the 253rd issue! Oh my, time flies! It's hard to believe this week marks 5 years since I

CES 2025 - Sync #501

Sunday, January 12, 2025

Plus: Sam Altman reflects on the last two years; Anthropic reportedly in talks to raise $2B at $60B valuation; e-tattoo decodes brainwaves; anthrobots; top 25 biotech companies for 2025; and more! ͏ ͏

PD#608 Mistakes engineers make in large established codebases

Sunday, January 12, 2025

You can't practice it beforehand ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌

C#539 A detailed look at EF Core’s JSON Columns feature

Sunday, January 12, 2025

Comparing it with the traditional tables with indexes ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

RD#488 How to avoid issues with custom Hooks

Sunday, January 12, 2025

Using them carelessly can lead to many problems ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Daily Coding Problem: Problem #1666 [Easy]

Sunday, January 12, 2025

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Amazon. Given n numbers, find the greatest common denominator between them. For example,

🛜 Here's What Happens to Old Websites — Features the Pixel Should Copy From Samsung's One UI 7

Sunday, January 12, 2025

Also: What Instagram Needs to Compete With TikTok, and More! How-To Geek Logo January 12, 2025 Did You Know Mount Wingen, located near Wingen, New South Wales in Australia, is better known as Burning

☁️ Azure Weekly #498 - 12th January 2025

Sunday, January 12, 2025

Festive Tech Calendar 2024 recap, GitHub Copilot Bootcamp, and Dev Containers FTW! ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Sunday Digest | Featuring 'The Income Needed to Join the Top 1% in Every U.S. State' 📊

Sunday, January 12, 2025

Every visualization published this week, in one place. Jan 12, 2025 | View Online | Subscribe | VC+ | Download Our App Hello, welcome to your Sunday Digest. This week, we visualized the value of the

Android Weekly #657 🤖

Sunday, January 12, 2025

View in web browser 657 January 12th, 2025 Android Weekly Updates Follow us on BlueSky We're there as well! Articles & Tutorials Sponsored Multi-Layered Mobile App Protection Attackers