SRE Weekly - SRE Weekly Issue #295

View on sreweekly.com

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo:
https://rootly.com/?utm_source=sreweekly

Articles

I love this crystal clear argument based on statistics and research. MTTR as a metric is simply meaningless.

Courtney Nash — Verica

Their steps for better communication during an outage:

  • Provide context to minimise speculation
  • Explain what you’re doing to demonstrate you’re ‘on it’
  • Set some expectations for when things will return to normal
  • Tell people what they should do0
  • Let folks know when you’ll be updating them next

Chris Evans — incident.io

Despite checking in advance to be sure their systems would support the new Let’s Encrypt certificate chain, they ran into trouble.

[…] we discovered that several HTTP client libraries our systems use were using their own vendored root certificates.

Heroku

This is the best case I’ve seen yet against multi-cloud infrastructure. I really like the airline analogy.

Lydia Leong

Roblox had a major, several-day outage starting on October 28. I don’t usually include game outages in the Outages section since they’re so common and there’s not usually much information to learn from, I sure do like a good post-incident report. Thanks, folks!

David Baszucki — Roblox

When you’re sending small TCP packets, two optimizations can conspire to introduce an artificial 40 millisecond (not megasecond…) delay.

Vorner

_Here’s Google’s follow-up report for their October 25-26 Meet outage.

Should you count failed requests toward your SLI if the client retries and succeeds? A good argument can be made on either side.

u/Sufficient_Tree4275 and other Reddit users

Mercari restructured its SRE team, moving toward an embedded model to adapt to their growing microservice architecture.

ShibuyaMitsuhiro — Mercari

There’s a really great discussion in this episode about leaving slack in the system in the form of bits of capacity and inefficiency that can be drawn upon to buy time during an outage.

Courtney Nash, with guests Liz Fong-Jones and Fred Hebert — Verica

Here’s how non-SREs can use SRE principles to improve their systems.

Laurel Frazier — Transposit

Outages







This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
SRE Weekly · PO Box 253 · South Lancaster, MA 01561-0253 · USA

Older messages

SRE Weekly Issue #294

Monday, November 1, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

SRE Weekly Issue #293

Monday, October 25, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

SRE Weekly Issue #292

Monday, October 18, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

SRE Weekly Issue #291

Monday, October 11, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

SRE Weekly Issue #290

Monday, October 4, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

You Might Also Like

🔒 The Vault Newsletter: November issue 🔑

Monday, November 25, 2024

Get the latest business security news, updates, and advice from 1Password. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

🧐 The Most Interesting Phones You Didn't See in 2024 — Making Reddit Faster on Older Devices

Monday, November 25, 2024

Also: Best Black Friday Deals So Far, and More! How-To Geek Logo November 25, 2024 Did You Know If you look closely over John Lennon's shoulder on the iconic cover of The Beatles Abbey Road album,

JSK Daily for Nov 25, 2024

Monday, November 25, 2024

JSK Daily for Nov 25, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted

Ranked | How Americans Rate Business Figures 📊

Monday, November 25, 2024

This graphic visualizes the results of a YouGov survey that asks Americans for their opinions on various business figures. View Online | Subscribe Presented by: Non-consensus strategies that go where

Spyglass Dispatch: Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad

Monday, November 25, 2024

Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad The

Daily Coding Problem: Problem #1619 [Hard]

Monday, November 25, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given two non-empty binary trees s and t , check whether tree t has exactly the

Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow

Monday, November 25, 2024

Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the

What Investors Want From AI Startups in 2025

Monday, November 25, 2024

Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon

GCP Newsletter #426

Monday, November 25, 2024

Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's

⏳ 36 Hours Left: Help Get "The Art of Data" Across the Finish Line 🏁

Monday, November 25, 2024

Visual Capitalist plans to unveal its secrets behind data storytelling, but only if the book hits its minimum funding goal. View Online | Subscribe | Download Our App We Need Your Help Only 36 Hours