SRE Weekly - SRE Weekly Issue #236
Articles
A nice juicy post-incident report from the archives. Remember the first time you took down production?
Mads Hartmann — Glitch
While testing a new power transmission link, it was accidentally overloaded by a factor of ~14x, with far-reaching but ultimately well-managed effects.
Thanks to Jesper Lundkvist for this one.
As Facebook moved from a static to an auto-scaled web pool, they had to try to predict their expected demand as accurately as possible.
Daniel Boeve, Kiryong Ha, and Anca Agape — Facebook
The key lesson involves ensuring that your migrations avoid using parts of the production code, which could cause their action to change down the line inadvertently.
Frank Lin — Octopus Deploy
Cloudflare uses an interesting multi-layered approach to mitigating attacks.
Omer Yoachimik — Cloudflare
The availability/reliability distinction in this article is thought-provoking.
Emily Arnott — Blameless
2020 has shown the value of adaptive capacity. 2021 will show whether or not adaptive capacity can be sustained.
This article (not a video or podcast despite the name) also focuses on the increasing importance of learning from incidents.
Dr. Richard Cook — Adaptice Capacity Labs
What is resilience engineering? What does a resilience engineer do? Are there principles of resilience engineering? If so, what are they? What makes it possible to engineer resilience?
This academic paper uses a case study to show how a company engineered the resilience of their system in response to a series of incidents.
Richard I. Cook and Beth Adele Long — Applied Ergonomics
Outages
- Google Drive
- This is a post-analysis for two outages, one from this past week and the other from the week before.
- Discord
- Fastly
- Gandi
-
Postmortem regarding the Network Incident from September 15, 2020 on IAAS and PAAS FR-SD3, FR-SD5, and FR-SD6
A layer 2 network loop was accidentally introduced, on two separate occasions.
Sébastien Dupas — Gandi
-
- Azure
- This was an outage on Sept. 14 in the UK South region. A cooling system was shut off in error during a maintenance procedure.
|
Older messages
SRE Weekly Issue #235
Monday, September 14, 2020
View on sreweekly.com A message from our sponsor, StackHawk: Adding application security tests to your CI pipeline is simple. It typically takes <30 minutes to setup automated testing so you can be
SRE Weekly Issue #234
Monday, September 7, 2020
View on sreweekly.com Last Sunday, there was a major backbone Internet provider outage after I finished putting SRE Weekly together. There were so many outages that I'm not even going to bother
SRE Weekly Issue #233
Monday, August 31, 2020
View on sreweekly.com A message from our sponsor, StackHawk: Did you catch the GitLab Commit keynote by StackHawk Founder Joni Klippert? View on demand now to learn about how security got left behind,
SRE Weekly Issue #231
Tuesday, August 25, 2020
View on sreweekly.com I have a special treat for you this week: 7 detailed incident reports! Just a note, I'll be on vacation next week, so I'll see you in two weeks on August 23. A message
SRE Weekly Issue #232
Tuesday, August 25, 2020
View on sreweekly.com A message from our sponsor, StackHawk: Is your company adopting GraphQL? Adding security testing is simple. Watch this 20 minute walk through to see how easy it is to get up and
You Might Also Like
Mapped | The State of Democracy Around the World 🌐
Tuesday, March 11, 2025
After a historic election year, we show the state of democracy worldwide as it declines to its lowest level in two decades. View Online | Subscribe | Download Our App NEW REPORT: The Age of Data >
Stories, Free Tool & CRM Template
Tuesday, March 11, 2025
Notion stories, smart tools, and a free template to organize your contacts 🔥 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
LW 173 - How to become a Shopify Developer in 2025
Tuesday, March 11, 2025
How to become a Shopify Developer in 2025 Shopify Development news and articles Issue 173 - 03/11
This free AI tool beats Perplexity
Tuesday, March 11, 2025
Ubuntu vs. Debian; The new HR; YouTube randomizer -- ZDNET ZDNET Tech Today - US March 11, 2025 webfeetgettyimages-10141124 DuckDuckGo's AI beats Perplexity in one big way - and it's free to
⚙️ AI bubble bursts (?)
Tuesday, March 11, 2025
Plus: We talk to the CEO of Read AI
Post from Syncfusion Blogs on 03/11/2025
Tuesday, March 11, 2025
New blogs from Syncfusion ® Build AI-Powered Smart Form Filling App Using WPF PDF Viewer By Vikas S Learn to effortlessly fill PDF forms with the AI-powered smart fill app using WPF PDF Viewer and
🥽 Is the VR Experiment a Failure? — Every 3D Mario Game Ranked
Tuesday, March 11, 2025
Also: Why We Miss Sliding Keyboard Phones, and More! How-To Geek Logo March 11, 2025 Did You Know Oprah Winfrey, of US daytime talk show fame, was christened Orpah Winfrey at birth (named after the
The Sequence Knowledge #507: Beyond Language: RAG for Other Modalities
Tuesday, March 11, 2025
How RAG can be used in computer vision, audio and other modalities. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
ALERT: New Polymorphic Attack Clones Browser Extensions to Steal Credentials
Tuesday, March 11, 2025
THN Daily Updates Newsletter cover ⚡ LIVE WEBINAR ➟ ASPM: The Future of AppSec -- Boom or Bust? Discover How ASPM is Redefining Application Security with Smarter, Unified Solutions. Download Now
Re: You're Invited: Free Photo Management Class
Tuesday, March 11, 2025
This is your last chance to register for tomorrow's live online Photo Management Class, Wednesday, March 12, at 4:30 pm ET! Sign up now to attend the FREE Photo Management Class The recent changes