SRE Weekly - SRE Weekly Issue #441
View on sreweekly.com
This post aims to shed some light on why we migrated to Prometheus, as well as outline some of the technical challenges we faced during the process.
Eddie Bracho — Mixpanel
Amazon posted this thorough summary of a multi-service outage at the end of July. The impact stems from a complex distributed system failure in Kinesis.
Amazon
This team shows what they did to ferret out and eliminate occurrences of N+1 DB queries triggered by a single request in their Django app.
Gonzalo Lopez — Mixpanel
The folks at incident.io share about how they baked observability into the infrastructure for their new on-call tool.
Note for folks using screen readers: there's a picture without alt-text that contains the following important text:
- Overview dashboard
- System dashboard
- Logs
- Tracing
It's right after this sentence:
Those pieces fit together something like this:
Martha Lambert — incident.io
An overview of DST, which was a new concept for me. It's about running simulations to try to find faults in a distributed system.
Phil Eaton
If you build software that people depend on and are not operationally responsible for it (particularly on-call): you should be. 🛑
I like the way this one draws from the author's experience, plus the emphasis on feedback loops.
Amin Astaneh
Retries help increase service availability. However, if not done right, they can have a devastating impact on the service and elongate recovery time.
Rajesh Pandey
Keepalive pings are critical in any system that uses TCP, since connections can hang at any point. I've been meaning to write this one for years!
Lex Neva — Honeycomb
Full disclosure: Honeycomb is my employer.
|
Older messages
SRE Weekly Issue #440
Monday, September 2, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
SRE Weekly Issue #439
Monday, August 26, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
SRE Weekly Issue #438
Tuesday, August 20, 2024
View on sreweekly.com Are there any blind or low-vision readers out there that would be willing to answer a few questions? I'm looking to learn more about your experience of reading a newsletter
SRE Weekly Issue #437
Monday, August 12, 2024
View on sreweekly.com This week's issue is entirely focused on the CrowdStrike incident: more details on what happened, analysis, and learnings. I'll be back next week with a selection of all
SRE Weekly Issue #436
Monday, August 5, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
You Might Also Like
AI + high-stakes poker + Google's prompt cheat sheet
Tuesday, October 8, 2024
and a google prompt cheat sheet ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
👍 How My Phone Became My Favorite Game Console — Why Desktop Linux Matters
Tuesday, October 8, 2024
Also: iPhone Mirroring Is Here and Mostly Works, and More! How-To Geek Logo October 8, 2024 Did You Know At the end of the song "Sweet Child O' Mine," found on Guns N' Roses'
Software Testing Weekly - Issue 240
Tuesday, October 8, 2024
How Sonos Lost $200M: A Hard Lesson in Quality 🚨 View on the Web Archives ISSUE 240 October 8th 2024 COMMENT Welcome to the 240th issue! Back in June, I shared with you about the big problem with a new
Immutable Types, DuckDB & Pyodide, Free Threaded, and More
Tuesday, October 8, 2024
Differences Between Python's Mutable and Immutable Types #650 – OCTOBER 8, 2024 VIEW IN BROWSER The PyCoder's Weekly Logo Differences Between Python's Mutable and Immutable Types In this
Ranked | The Costliest Hurricanes to Hit the U.S. ☔
Tuesday, October 8, 2024
As of 2023, Hurricane Katrina is the costliest natural disaster in US history, causing over $200 billion in damages in 2024 dollars. View Online | Subscribe | Download Our App Presented by: NEW REPORT:
Daily Coding Problem: Problem #1572 [Easy]
Tuesday, October 8, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Yelp. Given a mapping of digits to letters (as in a phone number), and a digit string,
The Race for Server Space
Tuesday, October 8, 2024
Apple's Leak, Disney's Star Wars, Google's Epic Fail, OpenAI's Space Race The Race for Server Space Apple's Leak, Disney's Star Wars, Google's Epic Fail, OpenAI's Space
Microsoft goes Go for SQL Server's CLI
Tuesday, October 8, 2024
Plus new ways to deploy Go apps, reflecting on reflection, and Windows gets high resolution timers in Go. | Together with Frontend Masters logo #526 — October 8, 2024 Unsub | Web Version Go Weekly
⚙️ Nvidia's new Agents
Tuesday, October 8, 2024
Plus: Chipmaker delivers 100k GPUs
How Does Visual Capitalist Work With Clients? 💪
Tuesday, October 8, 2024
Here's how organizations can partner with Visual Capitalist to leverage world-class data storytelling, and its strong audience and reach. View Online | Subscribe | Download Our App For 13 years,