SRE Weekly - SRE Weekly Issue #331
Articles
I’ve been listening to this podcast this week and I love it! Each episode covers a disaster, safety theory, and other topics — with no ads. Their site is down right now, but the podcast is available on the usual platforms.
Drew Rae — DisasterCast
If we want to get folks to own their code in production, we need to teach them how to think like an SRE.
Boris Cherkasky
Let’s look at three mistakes I’ve made during those stressful moments during the beginning of an incident — and discuss how you can avoid making them.
The mistakes are:
Mistake 1: We didn’t have a plan.
Mistake 2: We weren’t production ready.
Mistake 3: We fell down a cognitive tunnel.
Robert Ross — FireHydrant
At what point does your canary test indicate failure? Should the criteria be the same as your normal production alerting?
Øystein Blixhavn
This is a followup to a previous article about on-call health. In this one, the author shares metrics about the number of alerts and discusses whether this number is useful.
  Fred Hebert — Honeycomb
Their dashboard crashed for 50% of user sessions, so they had a lot of work ahead of them. Find out how they got crash-free sessions to 99.9% and improved their time to respond to incidents.
  Sandesh Damkondwar — Razorpay
Rogers Communications, a major telecom in Canada, had a country-wide outage earlier this month. I don’t normally include telecom outages in the Outages section because they rarely share information that we can learn from.
This time, Rogers released a (redacted) report on their outage, and this Twitter thread summarizes the key points.
@atoonk on Twitter
Outages
- Microsoft Teams and Office 365
- Microsoft blames storage error for Teams outage
- Google Cloud Storage
- Google Cloud europe-west2 region
-
Preliminary root cause has been identified as multiple concurrent failures to our redundant cooling systems within one of the buildings that hosts the europe-west2-a zone for the europe-west2 region.
-
|
Key phrases
Older messages
SRE Weekly Issue #330
Monday, July 18, 2022
View on sreweekly.com Thanks for all the well-wishes as I took a sick day last week. I'm feeling much better! A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒.
SRE Weekly Issue #329
Monday, July 4, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #328
Monday, June 27, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #327
Monday, June 20, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #326
Monday, June 13, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
You Might Also Like
a16z’s Infrastructure team gets a new general partner
Friday, April 19, 2024
Post News is shutting down and Wall Street isn't feeling a Salesforce-Informatica pairing View this email online in your browser By Christine Hall Friday, April 19, 2024 Image Credits: Andreessen
New Roundtable! Additive for Mass Production Applications
Friday, April 19, 2024
The Outlook for the Future View this email in your browser engineering.com Roundtable - Additive for Mass Production Applications: The Outlook for the Future 6 Considerations for Choosing the Right
📷 What to Know About Macro Photography — Why You Should Buy a Budget Motherboard
Friday, April 19, 2024
Also: How to Automatically Highlight Values in Excel, and More! How-To Geek Logo April 19, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your
Is the wind going out of the AI sails?
Friday, April 19, 2024
Rippling vacuums up venture capital and Ramp bags more millions View this email online in your browser By Haje Jan Kamps Friday, April 19, 2024 Image Credits: Getty Images / Carol Yepes Welcome to
Llama 3 is out - Weekly News Roundup - Issue #463
Friday, April 19, 2024
Plus: brand-new, all-electric Atlas; AI Index Report 2024; Microsoft pitched GenAI tools to US military; Humane AI Pin reviews are in; debunking Devin; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Daily Coding Problem: Problem #1417 [Easy]
Friday, April 19, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Wayfair. You are given a 2 x N board, and instructed to completely cover the board with
Charted | How Hard Is It to Get Into an Ivy League School? 🎓
Friday, April 19, 2024
We detail the admission rates and average annual cost for Ivy League schools, as well as the median SAT scores required to be accepted. View Online | Subscribe Presented by: Discover the motivations
Dark Matter & Tortured Poets
Friday, April 19, 2024
New music releases aren't what they used to be -- for good and bad. Dark Matter & Tortured Poets By MG Siegler • 19 Apr 2024 View in browser View in browser New music releases in 2024 are a
Impact of AI on Product Management
Friday, April 19, 2024
Impact of AI on Product Management The rise of the AI Product Manager. Product managers have always championed customer's needs. However, with AI, the job requires new technical and ethical
⚙️ Zuck has entered the chat(bot)
Friday, April 19, 2024
Plus: AI video's coming to mobile!