SRE Weekly - SRE Weekly Issue #441
View on sreweekly.com
This post aims to shed some light on why we migrated to Prometheus, as well as outline some of the technical challenges we faced during the process.
Eddie Bracho — Mixpanel
Amazon posted this thorough summary of a multi-service outage at the end of July. The impact stems from a complex distributed system failure in Kinesis.
Amazon
This team shows what they did to ferret out and eliminate occurrences of N+1 DB queries triggered by a single request in their Django app.
Gonzalo Lopez — Mixpanel
The folks at incident.io share about how they baked observability into the infrastructure for their new on-call tool.
Note for folks using screen readers: there's a picture without alt-text that contains the following important text:
- Overview dashboard
- System dashboard
- Logs
- Tracing
It's right after this sentence:
Those pieces fit together something like this:
Martha Lambert — incident.io
An overview of DST, which was a new concept for me. It's about running simulations to try to find faults in a distributed system.
Phil Eaton
If you build software that people depend on and are not operationally responsible for it (particularly on-call): you should be. 🛑
I like the way this one draws from the author's experience, plus the emphasis on feedback loops.
Amin Astaneh
Retries help increase service availability. However, if not done right, they can have a devastating impact on the service and elongate recovery time.
Rajesh Pandey
Keepalive pings are critical in any system that uses TCP, since connections can hang at any point. I've been meaning to write this one for years!
Lex Neva — Honeycomb
Full disclosure: Honeycomb is my employer.
|
Older messages
SRE Weekly Issue #440
Monday, September 2, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
SRE Weekly Issue #439
Monday, August 26, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
SRE Weekly Issue #438
Tuesday, August 20, 2024
View on sreweekly.com Are there any blind or low-vision readers out there that would be willing to answer a few questions? I'm looking to learn more about your experience of reading a newsletter
SRE Weekly Issue #437
Monday, August 12, 2024
View on sreweekly.com This week's issue is entirely focused on the CrowdStrike incident: more details on what happened, analysis, and learnings. I'll be back next week with a selection of all
SRE Weekly Issue #436
Monday, August 5, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
You Might Also Like
📧 Did you watch the FREE chapter of Pragmatic REST APIs?
Friday, February 28, 2025
Hey, it's Milan. 👋 The weekend is almost upon us. So, if you're up for some quality learning, consider watching the free chapter of Pragmatic REST APIs. Scroll down to the curriculum or click
Data Science Weekly - Issue 588
Thursday, February 27, 2025
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
💎 Issue 458 - Why Ruby on Rails still matters
Thursday, February 27, 2025
This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 458 Release Date Feb 27, 2025 Your weekly report of the most popular Ruby news, articles and
📱 Issue 452 - Three questions about Apple, encryption, and the U.K
Thursday, February 27, 2025
This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 452 Release Date Feb 27, 2025 Your weekly report of the most popular iOS news, articles and projects Popular
💻 Issue 451 - .NET 10 Preview 1 is now available!
Thursday, February 27, 2025
This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 451 Release Date Feb 27, 2025 Your weekly report of the most popular .NET news, articles and projects
💻 Issue 458 - Full Stack Security Essentials: Preventing CSRF, Clickjacking, and Ensuring Content Integrity in JavaScript
Thursday, February 27, 2025
This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 458 Release Date Feb 27, 2025 Your weekly report of the most popular Node.js news, articles and
💻 Issue 458 - TypeScript types can run DOOM
Thursday, February 27, 2025
This week's Awesome JavaScript Weekly Read this email on the Web The Awesome JavaScript Weekly Issue » 458 Release Date Feb 27, 2025 Your weekly report of the most popular JavaScript news, articles
💻 Issue 453 - Linus Torvalds Clearly Lays Out Linux Maintainer Roles Around Rust Code
Thursday, February 27, 2025
This week's Awesome Rust Weekly Read this email on the Web The Awesome Rust Weekly Issue » 453 Release Date Feb 27, 2025 Your weekly report of the most popular Rust news, articles and projects
💻 Issue 376 - Top 10 React Libraries/Frameworks for 2025 🚀
Thursday, February 27, 2025
This week's Awesome React Weekly Read this email on the Web The Awesome React Weekly Issue » 376 Release Date Feb 27, 2025 Your weekly report of the most popular React news, articles and projects
February 27th 2025
Thursday, February 27, 2025
Curated news all about PHP. Here's the latest edition Is this email not displaying correctly? View it in your browser. PHP Weekly 27th February 2025 Hi everyone, Laravel 12 is finally released, and