SRE Weekly - SRE Weekly Issue #236

View on sreweekly.com

A message from our sponsor, StackHawk:

Add application security checks with GitHub actions. Check out the video on how.
https://www.stackhawk.com/blog/application-security-with-github-actions?utm_source=SREWeekly

Articles

A nice juicy post-incident report from the archives. Remember the first time you took down production?

Mads Hartmann — Glitch

While testing a new power transmission link, it was accidentally overloaded by a factor of ~14x, with far-reaching but ultimately well-managed effects.

Thanks to Jesper Lundkvist for this one.

As Facebook moved from a static to an auto-scaled web pool, they had to try to predict their expected demand as accurately as possible.

Daniel Boeve, Kiryong Ha, and Anca Agape — Facebook

The key lesson involves ensuring that your migrations avoid using parts of the production code, which could cause their action to change down the line inadvertently.

Frank Lin — Octopus Deploy

Cloudflare uses an interesting multi-layered approach to mitigating attacks.

Omer Yoachimik — Cloudflare

The availability/reliability distinction in this article is thought-provoking.

Emily Arnott — Blameless

2020 has shown the value of adaptive capacity. 2021 will show whether or not adaptive capacity can be sustained.

This article (not a video or podcast despite the name) also focuses on the increasing importance of learning from incidents.

Dr. Richard Cook — Adaptice Capacity Labs

What is resilience engineering? What does a resilience engineer do? Are there principles of resilience engineering? If so, what are they? What makes it possible to engineer resilience?

This academic paper uses a case study to show how a company engineered the resilience of their system in response to a series of incidents.

Richard I. Cook and Beth Adele Long — Applied Ergonomics

Outages

  • Google Drive
    • This is a post-analysis for two outages, one from this past week and the other from the week before.
  • Instagram
  • Facebook
  • Discord
  • Fastly
  • Gandi
    • Postmortem regarding the Network Incident from September 15, 2020 on IAAS and PAAS FR-SD3, FR-SD5, and FR-SD6

      A layer 2 network loop was accidentally introduced, on two separate occasions.

      Sébastien Dupas — Gandi

  • Azure
    • This was an outage on Sept. 14 in the UK South region.  A cooling system was shut off in error during a maintenance procedure.






This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
SRE Weekly · PO Box 253 · South Lancaster, MA 01561-0253 · USA

Older messages

SRE Weekly Issue #235

Monday, September 14, 2020

View on sreweekly.com A message from our sponsor, StackHawk: Adding application security tests to your CI pipeline is simple. It typically takes <30 minutes to setup automated testing so you can be

SRE Weekly Issue #234

Monday, September 7, 2020

View on sreweekly.com Last Sunday, there was a major backbone Internet provider outage after I finished putting SRE Weekly together. There were so many outages that I'm not even going to bother

SRE Weekly Issue #233

Monday, August 31, 2020

View on sreweekly.com A message from our sponsor, StackHawk: Did you catch the GitLab Commit keynote by StackHawk Founder Joni Klippert? View on demand now to learn about how security got left behind,

SRE Weekly Issue #231

Tuesday, August 25, 2020

View on sreweekly.com I have a special treat for you this week: 7 detailed incident reports! Just a note, I'll be on vacation next week, so I'll see you in two weeks on August 23. A message

SRE Weekly Issue #232

Tuesday, August 25, 2020

View on sreweekly.com A message from our sponsor, StackHawk: Is your company adopting GraphQL? Adding security testing is simple. Watch this 20 minute walk through to see how easy it is to get up and

You Might Also Like

Mapped | The State of Democracy Around the World 🌐

Tuesday, March 11, 2025

After a historic election year, we show the state of democracy worldwide as it declines to its lowest level in two decades. View Online | Subscribe | Download Our App NEW REPORT: The Age of Data >

Stories, Free Tool & CRM Template

Tuesday, March 11, 2025

Notion stories, smart tools, and a free template to organize your contacts 🔥 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

LW 173 - How to become a Shopify Developer in 2025

Tuesday, March 11, 2025

How to become a Shopify Developer in 2025 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ Shopify Development news and articles Issue 173 - 03/11

This free AI tool beats Perplexity

Tuesday, March 11, 2025

Ubuntu vs. Debian; The new HR; YouTube randomizer -- ZDNET ZDNET Tech Today - US March 11, 2025 webfeetgettyimages-10141124 DuckDuckGo's AI beats Perplexity in one big way - and it's free to

⚙️ AI bubble bursts (?)

Tuesday, March 11, 2025

Plus: We talk to the CEO of Read AI ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Post from Syncfusion Blogs on 03/11/2025

Tuesday, March 11, 2025

New blogs from Syncfusion ® Build AI-Powered Smart Form Filling App Using WPF PDF Viewer By Vikas S Learn to effortlessly fill PDF forms with the AI-powered smart fill app using WPF PDF Viewer and

🥽 Is the VR Experiment a Failure? — Every 3D Mario Game Ranked

Tuesday, March 11, 2025

Also: Why We Miss Sliding Keyboard Phones, and More! How-To Geek Logo March 11, 2025 Did You Know Oprah Winfrey, of US daytime talk show fame, was christened Orpah Winfrey at birth (named after the

The Sequence Knowledge #507: Beyond Language: RAG for Other Modalities

Tuesday, March 11, 2025

How RAG can be used in computer vision, audio and other modalities. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

ALERT: New Polymorphic Attack Clones Browser Extensions to Steal Credentials

Tuesday, March 11, 2025

THN Daily Updates Newsletter cover ⚡ LIVE WEBINAR ➟ ASPM: The Future of AppSec -- Boom or Bust? Discover How ASPM is Redefining Application Security with Smarter, Unified Solutions. Download Now

Re: You're Invited: Free Photo Management Class

Tuesday, March 11, 2025

This is your last chance to register for tomorrow's live online Photo Management Class, Wednesday, March 12, at 4:30 pm ET! Sign up now to attend the FREE Photo Management Class The recent changes