SRE Weekly - SRE Weekly Issue #416
What can we, in turn, learn from some of the most honest and blameless—and public—postmortems of the last few years?
They cover incidents from GitLab, Tarsnap, Roblox, and Cloudflare with great summaries and takeaways.
The Hacker News
My favorite part of this interview is when Vanessa describes parenting twin babies as constant incident response.
Shane Hastie — InfoQ
Here follow some lessons I’ve learned from the trenches in small start-ups and larger engineering teams, to improve your on-call shift experience and remediation time for production issues and make sure you’re spending on-call efforts on what has the most impact.
Alex Wauters
Doing your chaos experiments in a non-production environment can feel safer, but what are you giving up?
Sam Rossoff — Gremlin
Sometimes, shell is just the right tool for the job.
Amin Astaneh — Certo Modo
Catherine from Mastodon summarized this incident report beautifully:
this is one of the most violently unhinged CSB reports i’ve ever read […]
while investigating an explosion at a facility, CSB staff tried to prevent another explosion of the same kind in the same facility, and being unable to convince the workers to not cause it, ended up hiding behind a shipping container
U.S. Chemical Safety and Hazard Investigation Board
This one’s about why people tend to want a “SPoG” and what we should want instead. Bonus points for the Star Trek reference.
Nočnica Mellifera — Checkly
Right in the middle of migrating from one datacenter to an HA pair of new datacenters, one of the new ones failed. They had to quickly do a partial rollback of the migration to ride out the outage.
Gauthier François — Doctolib
Today, we are thrilled to announce the release of bpftop, a command-line tool designed to streamline the performance optimization and monitoring of eBPF programs.
Jose Fernandez — Netflix
|
Older messages
SRE Weekly Issue #415
Monday, March 11, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant and talk shop with your DevOps peers on March 28! You'll gain a better understanding of what makes a fatigue-free on-
SRE Weekly Issue #414
Monday, March 4, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: 91% of engineering leaders say they want a better alerting tool. The other 9% couldn't take the survey on their Blackberry. Meet
SRE Weekly Issue #1
Monday, February 26, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Check out how global payments company Dock uses FireHydrant to streamline and consolidate their incident management stack and reduce what
SRE Weekly Issue #413
Monday, February 26, 2024
View on sreweekly.com Sorry about the automation fail and resend! That definitely wasn't issue #1. A message from our sponsor, FireHydrant: Check out how global payments company Dock uses
SRE Weekly Issue #412
Monday, February 19, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant's new and improved MTTX analytics dashboard is here! See which services are most affected by incidents, where they take
You Might Also Like
OpenAI search May 9 rumor 🤖, Tesla cuts interns 🚗, building a rocket engine 🚀
Friday, May 3, 2024
Reports indicate that OpenAI is looking to launch a search engine soon. OpenAI's in-house event on May 9 may focus on its release Sign Up |Advertise|View Online TLDR Together With LMNT TLDR 2024-05
Data Science Weekly - Issue 545
Friday, May 3, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
When accelerator dreams become company nightmares
Thursday, May 2, 2024
Plus: Illinois gives Rivian incentives and AI is not SaaS View this email online in your browser By Christine Hall Thursday, May 2, 2024 Hello, and welcome back to TechCrunch PM. We have a great lineup
📱 Issue 409 - Claude Team plan and iOS app
Thursday, May 2, 2024
This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 409 Release Date May 02, 2024 Your weekly report of the most popular iOS news, articles and projects Popular
💻 Issue 415 - Hotel WiFi JavaScript Injection (2012)
Thursday, May 2, 2024
This week's Awesome JavaScript Weekly Read this email on the Web The Awesome JavaScript Weekly Issue » 415 Release Date May 02, 2024 Your weekly report of the most popular JavaScript news, articles
💎 Issue 415 - Choosing the Right Audit Trail Approach in Ruby
Thursday, May 2, 2024
This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 415 Release Date May 02, 2024 Your weekly report of the most popular Ruby news, articles and
💻 Issue 408 - Speeding up C++ build times
Thursday, May 2, 2024
This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 408 Release Date May 02, 2024 Your weekly report of the most popular .NET news, articles and projects
💻 Issue 415 - Ditch dotenv: Node.js Now Natively Supports .env File Loading
Thursday, May 2, 2024
This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 415 Release Date May 02, 2024 Your weekly report of the most popular Node.js news, articles and
💻 Issue 333 - React 19 Beta
Thursday, May 2, 2024
This week's Awesome React Weekly Read this email on the Web The Awesome React Weekly Issue » 333 Release Date May 02, 2024 Your weekly report of the most popular React news, articles and projects
📱 Issue 412 - The Composable Architecture: My 3 Year Experience
Thursday, May 2, 2024
This week's Awesome Swift Weekly Read this email on the Web The Awesome Swift Weekly Issue » 412 Release Date May 02, 2024 Your weekly report of the most popular Swift news, articles and projects