SRE Weekly - SRE Weekly Issue #331
Articles
I’ve been listening to this podcast this week and I love it! Each episode covers a disaster, safety theory, and other topics — with no ads. Their site is down right now, but the podcast is available on the usual platforms.
Drew Rae — DisasterCast
If we want to get folks to own their code in production, we need to teach them how to think like an SRE.
Boris Cherkasky
Let’s look at three mistakes I’ve made during those stressful moments during the beginning of an incident — and discuss how you can avoid making them.
The mistakes are:
Mistake 1: We didn’t have a plan.
Mistake 2: We weren’t production ready.
Mistake 3: We fell down a cognitive tunnel.
Robert Ross — FireHydrant
At what point does your canary test indicate failure? Should the criteria be the same as your normal production alerting?
Øystein Blixhavn
This is a followup to a previous article about on-call health. In this one, the author shares metrics about the number of alerts and discusses whether this number is useful.
  Fred Hebert — Honeycomb
Their dashboard crashed for 50% of user sessions, so they had a lot of work ahead of them. Find out how they got crash-free sessions to 99.9% and improved their time to respond to incidents.
  Sandesh Damkondwar — Razorpay
Rogers Communications, a major telecom in Canada, had a country-wide outage earlier this month. I don’t normally include telecom outages in the Outages section because they rarely share information that we can learn from.
This time, Rogers released a (redacted) report on their outage, and this Twitter thread summarizes the key points.
@atoonk on Twitter
Outages
- Microsoft Teams and Office 365
- Microsoft blames storage error for Teams outage
- Google Cloud Storage
- Google Cloud europe-west2 region
-
Preliminary root cause has been identified as multiple concurrent failures to our redundant cooling systems within one of the buildings that hosts the europe-west2-a zone for the europe-west2 region.
-
|
Older messages
SRE Weekly Issue #330
Monday, July 18, 2022
View on sreweekly.com Thanks for all the well-wishes as I took a sick day last week. I'm feeling much better! A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒.
SRE Weekly Issue #329
Monday, July 4, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #328
Monday, June 27, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #327
Monday, June 20, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
SRE Weekly Issue #326
Monday, June 13, 2022
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and
You Might Also Like
JSK Daily for Nov 25, 2024
Monday, November 25, 2024
JSK Daily for Nov 25, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
Ranked | How Americans Rate Business Figures 📊
Monday, November 25, 2024
This graphic visualizes the results of a YouGov survey that asks Americans for their opinions on various business figures. View Online | Subscribe Presented by: Non-consensus strategies that go where
Spyglass Dispatch: Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad
Monday, November 25, 2024
Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad The
Daily Coding Problem: Problem #1619 [Hard]
Monday, November 25, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given two non-empty binary trees s and t , check whether tree t has exactly the
Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow
Monday, November 25, 2024
Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the
What Investors Want From AI Startups in 2025
Monday, November 25, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon
GCP Newsletter #426
Monday, November 25, 2024
Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's
⏳ 36 Hours Left: Help Get "The Art of Data" Across the Finish Line 🏁
Monday, November 25, 2024
Visual Capitalist plans to unveal its secrets behind data storytelling, but only if the book hits its minimum funding goal. View Online | Subscribe | Download Our App We Need Your Help Only 36 Hours
DeveloPassion's Newsletter #180 - Black Friday Week
Monday, November 25, 2024
Edition 180 of my newsletter, discussing Knowledge Management, Knowledge Work, Zen Productivity, Personal Organization, and more! Sébastien Dubois DeveloPassion's Newsletter DeveloPassion's
Meet HackerNoon's Latest Features: Boost Stories with Translations, Speech-to-Text & More
Monday, November 25, 2024
Hey, Hacker! HackerNoon's monthly product update is here! Get ready for a new version of the mobile app, more translation developments, a new AI Gallery, backend moves, and more! 🚀 This product