SRE Weekly - SRE Weekly Issue #295

View on sreweekly.com

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo:
https://rootly.com/?utm_source=sreweekly

Articles

I love this crystal clear argument based on statistics and research. MTTR as a metric is simply meaningless.

Courtney Nash — Verica

Their steps for better communication during an outage:

  • Provide context to minimise speculation
  • Explain what you’re doing to demonstrate you’re ‘on it’
  • Set some expectations for when things will return to normal
  • Tell people what they should do0
  • Let folks know when you’ll be updating them next

Chris Evans — incident.io

Despite checking in advance to be sure their systems would support the new Let’s Encrypt certificate chain, they ran into trouble.

[…] we discovered that several HTTP client libraries our systems use were using their own vendored root certificates.

Heroku

This is the best case I’ve seen yet against multi-cloud infrastructure. I really like the airline analogy.

Lydia Leong

Roblox had a major, several-day outage starting on October 28. I don’t usually include game outages in the Outages section since they’re so common and there’s not usually much information to learn from, I sure do like a good post-incident report. Thanks, folks!

David Baszucki — Roblox

When you’re sending small TCP packets, two optimizations can conspire to introduce an artificial 40 millisecond (not megasecond…) delay.

Vorner

_Here’s Google’s follow-up report for their October 25-26 Meet outage.

Should you count failed requests toward your SLI if the client retries and succeeds? A good argument can be made on either side.

u/Sufficient_Tree4275 and other Reddit users

Mercari restructured its SRE team, moving toward an embedded model to adapt to their growing microservice architecture.

ShibuyaMitsuhiro — Mercari

There’s a really great discussion in this episode about leaving slack in the system in the form of bits of capacity and inefficiency that can be drawn upon to buy time during an outage.

Courtney Nash, with guests Liz Fong-Jones and Fred Hebert — Verica

Here’s how non-SREs can use SRE principles to improve their systems.

Laurel Frazier — Transposit

Outages







This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
SRE Weekly · PO Box 253 · South Lancaster, MA 01561-0253 · USA

Older messages

SRE Weekly Issue #294

Monday, November 1, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

SRE Weekly Issue #293

Monday, October 25, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

SRE Weekly Issue #292

Monday, October 18, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

SRE Weekly Issue #291

Monday, October 11, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

SRE Weekly Issue #290

Monday, October 4, 2021

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right

You Might Also Like

GCP Newsletter #433

Monday, January 13, 2025

Welcome to issue #433 January 13th, 2025 News Official Blog Vertex AI Introducing Vertex AI RAG Engine: Scale your Vertex AI RAG pipeline with confidence - Vertex AI RAG Engine is a fully managed

Spyglass Dispatch: It's Political & Personal

Monday, January 13, 2025

On Meta's Moderation Changes • Inside DOGE • Zuck Slams Apple (Again) • Apple's Muted 2025 • CES 2025 Recap The Spyglass Dispatch is a newsletter sent on weekdays featuring links and commentary

$200 to invest today... (USA Only)

Monday, January 13, 2025

Join me in investing in blue chip art on Masterworks, and you will receive $200 to invest on the platform. Not kidding. Founder interview coming soon! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

The Sequence Knowledge #468: A New Series About RAG

Monday, January 13, 2025

Exploring key concepts of one of the most popular methods in generative AI solutions. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

How a Kafka-Like Producer Writes to Disk

Monday, January 13, 2025

We take a Kafka client, call the producer, send the message, and boom, expect it to be delivered on the other end. And that's actually how it goes. But wouldn't it be nice to understand better

FAQs: The AI Consultancy Project

Monday, January 13, 2025

This is how we'll help you become an AI Consultant ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

⚡ THN Weekly Recap: Top Cybersecurity Threats, Tools and Tips [13 Jan]

Monday, January 13, 2025

Your one-stop-source for last week's top cybersecurity headlines. The Hacker News Cybersecurity Recap The cyber world's been buzzing this week, and it's all about staying ahead of the bad

My 3 must-buy CES 2025 gadgets

Monday, January 13, 2025

Alarming iPhone bug; Router-based VPN; 90-second vision test -- ZDNET ZDNET Tech Today - US January 13, 2025 Mcon magsafe phone controller Three CES 2025 products I'd buy as soon as they'd take

⚙️ Meta's copyright struggles

Monday, January 13, 2025

Plus: Achieving data center efficiency ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Post from Syncfusion Blogs on 01/13/2025

Monday, January 13, 2025

New blogs from Syncfusion Top 5 React Chart Libraries for 2025 By Gowrimathi S Explore the top 5 React chart libraries with a comparison of their features, pros, and cons to boost your data