SRE Weekly - SRE Weekly Issue #441

View on sreweekly.com

A message from our sponsor, FireHydrant:

FireHydrant has acquired Blameless! The addition of Blameless' enterprise capabilities combined with FireHydrant's platform creates the most comprehensive enterprise incident management solution in the market.

https://firehydrant.com/blog/press-release-firehydrant-acquires-blameless-to-further-solidify-enterprise/

This post aims to shed some light on why we migrated to Prometheus, as well as outline some of the technical challenges we faced during the process.

  Eddie Bracho — Mixpanel

Amazon posted this thorough summary of a multi-service outage at the end of July. The impact stems from a complex distributed system failure in Kinesis.

  Amazon

This team shows what they did to ferret out and eliminate occurrences of N+1 DB queries triggered by a single request in their Django app.

  Gonzalo Lopez — Mixpanel

The folks at incident.io share about how they baked observability into the infrastructure for their new on-call tool.

Note for folks using screen readers: there's a picture without alt-text that contains the following important text:

  1. Overview dashboard
  2. System dashboard
  3. Logs
  4. Tracing

It's right after this sentence:

Those pieces fit together something like this:

  Martha Lambert — incident.io

An overview of DST, which was a new concept for me. It's about running simulations to try to find faults in a distributed system.

  Phil Eaton

If you build software that people depend on and are not operationally responsible for it (particularly on-call): you should be. 🛑

I like the way this one draws from the author's experience, plus the emphasis on feedback loops.

  Amin Astaneh

Retries help increase service availability. However, if not done right, they can have a devastating impact on the service and elongate recovery time.

   Rajesh Pandey

Keepalive pings are critical in any system that uses TCP, since connections can hang at any point. I've been meaning to write this one for years!

  Lex Neva — Honeycomb

  Full disclosure: Honeycomb is my employer.







This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
SRE Weekly, a production of Tinker Tinker Tinker, LLC · PO Box 253 · South Lancaster, MA 01561-0253 · USA

Older messages

SRE Weekly Issue #440

Monday, September 2, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of

SRE Weekly Issue #439

Monday, August 26, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of

SRE Weekly Issue #438

Tuesday, August 20, 2024

View on sreweekly.com Are there any blind or low-vision readers out there that would be willing to answer a few questions? I'm looking to learn more about your experience of reading a newsletter

SRE Weekly Issue #437

Monday, August 12, 2024

View on sreweekly.com This week's issue is entirely focused on the CrowdStrike incident: more details on what happened, analysis, and learnings. I'll be back next week with a selection of all

SRE Weekly Issue #436

Monday, August 5, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of

You Might Also Like

WP Weekly 212 - Ecosystem - Hosting AI, $5 Million Raised, GDPR Social Feeds

Monday, September 23, 2024

Read on Website WP Weekly 212 / Ecosystem Since Matt Mullenweg's Q&A session at WordCamp US concluded, the WordPress ecosystem has been in active discussion mode! Also in this issue: Many

Party In The Rear 📺

Monday, September 23, 2024

How the rear projection television got flattened. Here's a version for your browser. Hunting for the end of the long tail • September 22, 2024 Today in Tedium: These days, it's common to see a

SRE Weekly Issue #443

Monday, September 23, 2024

View on sreweekly.com I'm working on launching a new sibling project to SRE Weekly that will have a different format. I'm on the lookout for potential sponsors now, so if you're interested,

👎 Mistakes to Avoid When Setting Up a Wi-Fi Network — Handhelds Are the Future of Gaming

Sunday, September 22, 2024

Also: Starlink Bypassed My Country's Bad Internet, and More! How-To Geek Logo September 22, 2024 Did You Know The letter "J" is not found anywhere on the periodic table of elements,

C#524 Anatomy of the .NET dictionary

Sunday, September 22, 2024

Impress friends and colleagues knowing your key value pairs ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

PD#593 On Being A Senior Engineer

Sunday, September 22, 2024

There are not many modern books about being good senior engineer ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

RD#473 Clean React with TypeScript

Sunday, September 22, 2024

How to properly type React components ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

⚙️ Special Edition: The Deep View talks to Gary Marcus

Sunday, September 22, 2024

We met up with Dr. Gary Marcus to talk AI and regulation. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Mastering our mind for better ideas & Setapp Mobile beta is here

Sunday, September 22, 2024

Team messaging is broken, unlock your full potential today, Linear launches mobile apps, eight ways to banish misery, and a lot more in this week's issue of Creativerly. Creativerly Mastering our

Daily Coding Problem: Problem #1564 [Hard]

Sunday, September 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Let A be an N by M matrix in which every row and every column is sorted. Given i