SRE Weekly - SRE Weekly Issue #441
View on sreweekly.com
This post aims to shed some light on why we migrated to Prometheus, as well as outline some of the technical challenges we faced during the process.
Eddie Bracho — Mixpanel
Amazon posted this thorough summary of a multi-service outage at the end of July. The impact stems from a complex distributed system failure in Kinesis.
Amazon
This team shows what they did to ferret out and eliminate occurrences of N+1 DB queries triggered by a single request in their Django app.
Gonzalo Lopez — Mixpanel
The folks at incident.io share about how they baked observability into the infrastructure for their new on-call tool.
Note for folks using screen readers: there's a picture without alt-text that contains the following important text:
- Overview dashboard
- System dashboard
- Logs
- Tracing
It's right after this sentence:
Those pieces fit together something like this:
Martha Lambert — incident.io
An overview of DST, which was a new concept for me. It's about running simulations to try to find faults in a distributed system.
Phil Eaton
If you build software that people depend on and are not operationally responsible for it (particularly on-call): you should be. 🛑
I like the way this one draws from the author's experience, plus the emphasis on feedback loops.
Amin Astaneh
Retries help increase service availability. However, if not done right, they can have a devastating impact on the service and elongate recovery time.
Rajesh Pandey
Keepalive pings are critical in any system that uses TCP, since connections can hang at any point. I've been meaning to write this one for years!
Lex Neva — Honeycomb
Full disclosure: Honeycomb is my employer.
|
Older messages
SRE Weekly Issue #440
Monday, September 2, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
SRE Weekly Issue #439
Monday, August 26, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
SRE Weekly Issue #438
Tuesday, August 20, 2024
View on sreweekly.com Are there any blind or low-vision readers out there that would be willing to answer a few questions? I'm looking to learn more about your experience of reading a newsletter
SRE Weekly Issue #437
Monday, August 12, 2024
View on sreweekly.com This week's issue is entirely focused on the CrowdStrike incident: more details on what happened, analysis, and learnings. I'll be back next week with a selection of all
SRE Weekly Issue #436
Monday, August 5, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
You Might Also Like
🎉 Black Friday Early Access: 50% OFF
Monday, November 25, 2024
Black Friday discount is now live! Do you want to master Clean Architecture? Only this week, access the 50% Black Friday discount. Here's what's inside: 7+ hours of lessons .NET Aspire coming
Open Pull Request #59
Monday, November 25, 2024
LightRAG, anything-llm, llm, transformers.js and an Intro to monads for software devs ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Last chance to register: SecOps made smarter
Monday, November 25, 2024
Don't miss this opportunity to learn how gen AI can transform your security workflowsㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤ elastic | Search. Observe. Protect
SRE Weekly Issue #452
Monday, November 25, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-
Corporate Casserole 🥘
Monday, November 25, 2024
How marketing and lobbying inspired Thanksgiving traditions. Here's a version for your browser. Hunting for the end of the long tail • November 24, 2024 Hey all, Ernie here with a classic
WP Weekly 221 - Bluesky - WP Assets on CDN, Limit Font Subsets, ACF Pro Now
Monday, November 25, 2024
Read on Website WP Weekly 221 / Bluesky Have you joined Bluesky, like many other WordPress users, a new place for an online social presence? Also in this issue: CrawlWP, Asset Management Framework,
🤳🏻 We Need More High-End Small Phones — Linux Terminal Setup Tips
Sunday, November 24, 2024
Also: Why I Switched From Google Maps to Apple Maps, and More! How-To Geek Logo November 24, 2024 Did You Know Medieval moats didn't just protect castles from invaders approaching over land, but
JSK Daily for Nov 24, 2024
Sunday, November 24, 2024
JSK Daily for Nov 24, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
OpenAI's turbulent early years - Sync #494
Sunday, November 24, 2024
Plus: Anthropic and xAI raise billions of dollars; can a fluffy robot replace a living pet; Chinese reasoning model DeepSeek R1; robot-dog runs full marathon; a $12000 surgery to change eye colour ͏ ͏
Daily Coding Problem: Problem #1618 [Easy]
Sunday, November 24, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Zillow. Let's define a "sevenish" number to be one which is either a power