SRE Weekly - SRE Weekly Issue #331
Articles
I’ve been listening to this podcast this week and I love it! Each episode covers a disaster, safety theory, and other topics — with no ads. Their site is down right now, but the podcast is available on the usual platforms.
Drew Rae — DisasterCast
If we want to get folks to own their code in production, we need to teach them how to think like an SRE.
Boris Cherkasky
Let’s look at three mistakes I’ve made during those stressful moments during the beginning of an incident — and discuss how you can avoid making them.
The mistakes are:
Mistake 1: We didn’t have a plan.
Mistake 2: We weren’t production ready.
Mistake 3: We fell down a cognitive tunnel.
Robert Ross — FireHydrant
At what point does your canary test indicate failure? Should the criteria be the same as your normal production alerting?
Øystein Blixhavn
This is a followup to a previous article about on-call health. In this one, the author shares metrics about the number of alerts and discusses whether this number is useful.
  Fred Hebert — Honeycomb
Their dashboard crashed for 50% of user sessions, so they had a lot of work ahead of them. Find out how they got crash-free sessions to 99.9% and improved their time to respond to incidents.
  Sandesh Damkondwar — Razorpay
Rogers Communications, a major telecom in Canada, had a country-wide outage earlier this month. I don’t normally include telecom outages in the Outages section because they rarely share information that we can learn from.
This time, Rogers released a (redacted) report on their outage, and this Twitter thread summarizes the key points.
@atoonk on Twitter
Outages
- Microsoft Teams and Office 365
- Microsoft blames storage error for Teams outage
- Google Cloud Storage
- Google Cloud europe-west2 region
-
Preliminary root cause has been identified as multiple concurrent failures to our redundant cooling systems within one of the buildings that hosts the europe-west2-a zone for the europe-west2 region.
-
|
Key phrases
Older messages
Monday, July 18, 2022
View on sreweekly.com Thanks for all the well-wishes as I took a sick day last week. I'm feeling much better! A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒.
Monday, July 4, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
Monday, June 27, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
Monday, June 20, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
Monday, June 13, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
Linux replacements for antiquated tools, reimagine math with Python and Raspberry Pi, and more
Tuesday, August 16, 2022
Create beautiful PDFs in LaTeX Create beautiful PDFs in LaTeX Opensource.com THE LATEST Our favorite Linux replacements for antiquated open source tools We asked our community of contributors what open
Issue #11: Focused Work by Michael Tigas
Tuesday, August 16, 2022
Today, we're taking a look at Focused Work by Michael Tigas. Focused Work is a powerful timer that helps you have highly productive days. Inspired by the Pomodoro Technique, it's purpose-built
New Blogs on ThomasMaurer.ch for 08/16/2022
Tuesday, August 16, 2022
View this email in your browser Thomas Maurer Cloud & Datacenter Update This is the update for blog posts on ThomasMaurer.ch. Schedule and Run PowerShell Scripts for Azure VMs using Azure
[Python Mastery] What does it mean to "master" Python?
Tuesday, August 16, 2022
Hey there, A couple of years ago I'd become quite interested in martial arts. Hours upon hours of watching "The Karate Kid" growing up must've taken their toll on me... And so, I
Using HPC clusters to run simulations & solve complex equations
Tuesday, August 16, 2022
How to select the best processor & HPC system View this email in your browser engineering.com White Paper - How to Select the Best Processor and HPC System for Your Workloads How to Select the Best
Don't miss out: Ready, Set, Actionable search results with Elastic Enterprise Search
Tuesday, August 16, 2022
Tailor-made for participants in tech and non-tech roles elastic | Search. Observe. Protect Search & Stream with Elastic Enterprise Search You've looked for a good movie to watch, but ever
Monday, August 15, 2022
Issue #860 — Top 20 stories of August 16, 2022 Issue #860 — August 16, 2022 You receive this email because you are subscribed to Hacker News Digest. You can open it in the browser if you prefer. 1
Meta backs SMB e-commerce app launched by former Facebook engineering manager
Monday, August 15, 2022
TechCrunch Newsletter TechCrunch logo The Daily Crunch logo By Christine Hall and Haje Jan Kamps Monday, August 15, 2022 Hello, Crunchers! Wait, that's kind of a weird nickname, as if you're
Monday, August 15, 2022
JSK Daily for Aug 15, 2022 View this email in your browser A community curated daily e-mail of JavaScript news Handling Optimistic Concurrency in Web Frontends Building scalable web applications
Live Long, Vote and Nominate #Noonies2022
Monday, August 15, 2022
for the brightest stars in tech ✨ The Noonies 2022 Greetings, Hackers 👋🏽 As you well know, HackerNoon is recognizing the incredible work of those who have boldly gone where no one has gone before, for