SRE Weekly - SRE Weekly Issue #329
Articles
A primer on what makes a good runbook.
Runbooks are most effective when they are readily available, easily actionable, and up-to-date and accurate.
Cortex
In this article, we describe the architecture and implementation of our SRE infrastructure, how it is used and how it was adopted.
Philipp Gündisch and Vladyslav Ukis — Siemens
After an explanation of tech debt, this article goes into a possible solution: having on-call folks fix lingering problems in between pages.
Dormain Drewitz — The New Stack
I’ve read plenty of articles about service ownership, but this one has something new: a discussion of how to divvy up a monolith into separate “services” for teams to own.
Hannah Culver — PagerDuty
The folks at Sendinblue have chronicled their journey to better incident response, and there’s a lot here to learn from.
Tanguy Antoine — Sendinblue
Incidents will always happen, but thankfully they have plenty of upsides, as this article explains.
Andre King — Rootly
This article is published by my sponsor, Rootly, but their sponsorship did not influence its inclusion in this issue.
You’re not getting paged. Is it because you’ve fixed all the things, or has your alerting atrophied?
Boris Cherkasky
The folks at incident.io are here with the results of their survey of on-call practices. I like the focus on compensation for being on-call.
incident.io
Outages
- Netflix briefly went down for some users after the new ‘Stranger Things’ episodes debuted Friday morning, according to outage reports
- GitHub
- Zebrium
-
I noticed this one while trying to read one of their articles. I was getting NXDOMAIN trying to resolve zebrium.com.
-
|
Older messages
SRE Weekly Issue #328
Monday, June 27, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #327
Monday, June 20, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #326
Monday, June 13, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #325
Monday, June 6, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #324
Monday, May 30, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
You Might Also Like
Mapped | European Fertility Rates by Country 👶
Wednesday, March 5, 2025
The population replacement threshold is a fertility rate of 2.1. In 2025, all of Europe, except one small nation, is well below that level. View Online | Subscribe | Download Our App Invest in your
Trust in JS supply chain; sync vs. async code; JIT vulnerabilities; parseInt() and keycap emojis; V8
Wednesday, March 5, 2025
We have 10 links for you - the latest on JavaScript and tools Secure your JavaScript dependencies. socket.dev Sponsor Open source code makes up 90% of most codebases. Socket detects what traditional
The importance of flow state for developers
Wednesday, March 5, 2025
You are receiving this email because you subscribed to microservices.io. Considering migrating a monolith to microservices? Struggling with the microservice architecture? I can help: architecture
This beefy phone is a projector too 📽️
Wednesday, March 5, 2025
Biggest tech opps; How Firefox changed; Drone flying tips -- ZDNET ZDNET Tech Today - US March 5, 2025 GOTRAX 4 electric scooter A smartphone that's also a projector? I tested it, and it's
⚙️ Self-driving Ubers
Wednesday, March 5, 2025
Plus: A trade war, and AI
Post from Syncfusion Blogs on 03/05/2025
Wednesday, March 5, 2025
New blogs from Syncfusion ® S&P 500 Returns After Rate Cuts: Visualized Using a Flutter Heatmap By Kompelli Sravan Kumar Kompelli Lakshman Learn how to visualize the S&P 500 returns after
10 Best Practices for Cloud Visibility
Wednesday, March 5, 2025
Learn how to instantly improve cloud visibility — 10 proven strategies ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Engineering #503: Stanford Researchers Just Created a New Agentic Framework for Tool Usage and Comple…
Wednesday, March 5, 2025
OctoTools addresses some of the core limitations of agentic solutions. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
URGENT: VMware Security Flaws Exploited in the Wild唯roadcom Releases Urgent Patches
Wednesday, March 5, 2025
THN Daily Updates Newsletter cover ⚡ LIVE WEBINAR ➟ AI Uncovered: Re-Shaping Security Strategies for Resilience in the Era of AI AI Is Changing the Cybersecurity Game -- Learn the Secrets to Outsmart
🏅 Best of Mobile World Congress 2025 — The Brief History of Gaming on Mac
Wednesday, March 5, 2025
Also: What to Know Before Buying a Mini PC for Linux, and More! How-To Geek Logo March 5, 2025 Did You Know Comedy Central owes quite a debt of gratitude to the show South Park. The wild success of the