SRE Weekly - SRE Weekly Issue #331
Articles
I’ve been listening to this podcast this week and I love it! Each episode covers a disaster, safety theory, and other topics — with no ads. Their site is down right now, but the podcast is available on the usual platforms.
Drew Rae — DisasterCast
If we want to get folks to own their code in production, we need to teach them how to think like an SRE.
Boris Cherkasky
Let’s look at three mistakes I’ve made during those stressful moments during the beginning of an incident — and discuss how you can avoid making them.
The mistakes are:
Mistake 1: We didn’t have a plan.
Mistake 2: We weren’t production ready.
Mistake 3: We fell down a cognitive tunnel.
Robert Ross — FireHydrant
At what point does your canary test indicate failure? Should the criteria be the same as your normal production alerting?
Øystein Blixhavn
This is a followup to a previous article about on-call health. In this one, the author shares metrics about the number of alerts and discusses whether this number is useful.
  Fred Hebert — Honeycomb
Their dashboard crashed for 50% of user sessions, so they had a lot of work ahead of them. Find out how they got crash-free sessions to 99.9% and improved their time to respond to incidents.
  Sandesh Damkondwar — Razorpay
Rogers Communications, a major telecom in Canada, had a country-wide outage earlier this month. I don’t normally include telecom outages in the Outages section because they rarely share information that we can learn from.
This time, Rogers released a (redacted) report on their outage, and this Twitter thread summarizes the key points.
@atoonk on Twitter
Outages
- Microsoft Teams and Office 365
- Microsoft blames storage error for Teams outage
- Google Cloud Storage
- Google Cloud europe-west2 region
-
Preliminary root cause has been identified as multiple concurrent failures to our redundant cooling systems within one of the buildings that hosts the europe-west2-a zone for the europe-west2 region.
-
|
Older messages
SRE Weekly Issue #330
Monday, July 18, 2022
View on sreweekly.com Thanks for all the well-wishes as I took a sick day last week. I'm feeling much better! A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒.
SRE Weekly Issue #329
Monday, July 4, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #328
Monday, June 27, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #327
Monday, June 20, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #326
Monday, June 13, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
You Might Also Like
NVIDIA AI Software Party at a Hardware Show
Sunday, January 12, 2025
A tremendous number of AI software releases at CES. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Laravel 11.37, Recurr, Streaming Responses, and more! - №547
Sunday, January 12, 2025
Your Laravel week in review ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
NATO Alphabet Converter/Huge If True/Framework for letting "it" go
Sunday, January 12, 2025
Recomendo - issue #445 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Kotlin Weekly #441
Sunday, January 12, 2025
ISSUE #441 12th of January 2025 Announcements Become a KotlinConf 2025 volunteer! The KotlinConf has started a Call for Volunteers to help out at the conference in May! If you are interested, check out
Healthy life, Meta's AI and legibility
Saturday, January 11, 2025
Neologism #25, 11.01.2024 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Daily Coding Problem: Problem #1665 [Medium]
Saturday, January 11, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by LinkedIn. A wall consists of several rows of bricks of various integer lengths and
📊 Every Smartphone I've Ever Owned, Ranked — This Tiny Smart Remote Is the Most Exciting Thing at CES
Saturday, January 11, 2025
Also: 5 Android Notification Features to Make Your Day Easier, and More! How-To Geek Logo January 11, 2025 Did You Know On March 12, 1951, a curious thing happened. In the United States and the United
Ranked | The Top Grossing Movies Worldwide in 2024 🎬
Saturday, January 11, 2025
Established IP dominated the 2024 box office, with top films mostly being sequels, spin-offs, or franchise continuations. View Online | Subscribe | Download Our App FEATURED STORY Ranked: Top Grossing
📖 Your Step-by-Step Guide to Securing AI in the Enterprise
Saturday, January 11, 2025
January 11, 2025 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Tines. When it comes to adopting AI securely,
🐍 New Python tutorials on Real Python
Saturday, January 11, 2025
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Iterators and Iterables in Python: Run Efficient