SRE Weekly - SRE Weekly Issue #236

View on sreweekly.com

A message from our sponsor, StackHawk:

Add application security checks with GitHub actions. Check out the video on how.
https://www.stackhawk.com/blog/application-security-with-github-actions?utm_source=SREWeekly

Articles

A nice juicy post-incident report from the archives. Remember the first time you took down production?

Mads Hartmann — Glitch

While testing a new power transmission link, it was accidentally overloaded by a factor of ~14x, with far-reaching but ultimately well-managed effects.

Thanks to Jesper Lundkvist for this one.

As Facebook moved from a static to an auto-scaled web pool, they had to try to predict their expected demand as accurately as possible.

Daniel Boeve, Kiryong Ha, and Anca Agape — Facebook

The key lesson involves ensuring that your migrations avoid using parts of the production code, which could cause their action to change down the line inadvertently.

Frank Lin — Octopus Deploy

Cloudflare uses an interesting multi-layered approach to mitigating attacks.

Omer Yoachimik — Cloudflare

The availability/reliability distinction in this article is thought-provoking.

Emily Arnott — Blameless

2020 has shown the value of adaptive capacity. 2021 will show whether or not adaptive capacity can be sustained.

This article (not a video or podcast despite the name) also focuses on the increasing importance of learning from incidents.

Dr. Richard Cook — Adaptice Capacity Labs

What is resilience engineering? What does a resilience engineer do? Are there principles of resilience engineering? If so, what are they? What makes it possible to engineer resilience?

This academic paper uses a case study to show how a company engineered the resilience of their system in response to a series of incidents.

Richard I. Cook and Beth Adele Long — Applied Ergonomics

Outages

  • Google Drive
    • This is a post-analysis for two outages, one from this past week and the other from the week before.
  • Instagram
  • Facebook
  • Discord
  • Fastly
  • Gandi
    • Postmortem regarding the Network Incident from September 15, 2020 on IAAS and PAAS FR-SD3, FR-SD5, and FR-SD6

      A layer 2 network loop was accidentally introduced, on two separate occasions.

      Sébastien Dupas — Gandi

  • Azure
    • This was an outage on Sept. 14 in the UK South region.  A cooling system was shut off in error during a maintenance procedure.






This email was sent to you
why did I get this?    unsubscribe from this list    update subscription preferences
SRE Weekly · PO Box 253 · South Lancaster, MA 01561-0253 · USA

Older messages

SRE Weekly Issue #235

Monday, September 14, 2020

View on sreweekly.com A message from our sponsor, StackHawk: Adding application security tests to your CI pipeline is simple. It typically takes <30 minutes to setup automated testing so you can be

SRE Weekly Issue #234

Monday, September 7, 2020

View on sreweekly.com Last Sunday, there was a major backbone Internet provider outage after I finished putting SRE Weekly together. There were so many outages that I'm not even going to bother

SRE Weekly Issue #233

Monday, August 31, 2020

View on sreweekly.com A message from our sponsor, StackHawk: Did you catch the GitLab Commit keynote by StackHawk Founder Joni Klippert? View on demand now to learn about how security got left behind,

SRE Weekly Issue #231

Tuesday, August 25, 2020

View on sreweekly.com I have a special treat for you this week: 7 detailed incident reports! Just a note, I'll be on vacation next week, so I'll see you in two weeks on August 23. A message

SRE Weekly Issue #232

Tuesday, August 25, 2020

View on sreweekly.com A message from our sponsor, StackHawk: Is your company adopting GraphQL? Adding security testing is simple. Watch this 20 minute walk through to see how easy it is to get up and

You Might Also Like

Youre Overthinking It

Wednesday, January 15, 2025

Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, January 15, 2025? The

eBook: Software Supply Chain Security for Dummies

Wednesday, January 15, 2025

Free access to this go-to-guide for invaluable insights and practical advice to secure your software supply chain. The Hacker News Software Supply Chain Security for Dummies There is no longer doubt

The 5 biggest AI prompting mistakes

Wednesday, January 15, 2025

✨ Better Pixel photos; How to quit Meta; The next TikTok? -- ZDNET ZDNET Tech Today - US January 15, 2025 ai-prompting-mistakes The five biggest mistakes people make when prompting an AI Ready to

An interactive tour of Go 1.24

Wednesday, January 15, 2025

Plus generating random art, sending emails, and a variety of gopher images you can use. | #​538 — January 15, 2025 Unsub | Web Version Together with Posthog Go Weekly An Interactive Tour of Go 1.24 — A

Spyglass Dispatch: Bromo Sapiens

Wednesday, January 15, 2025

Masculine Startups • The Fall of Xbox • Meta's Misinformation Off Switch • TikTok's Switch Off The Spyglass Dispatch is a newsletter sent on weekdays featuring links and commentary on timely

The $1.9M client

Wednesday, January 15, 2025

Money matters, but this invisible currency matters more. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

⚙️ Federal data centers

Wednesday, January 15, 2025

Plus: Britain's AI roadmap ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Post from Syncfusion Blogs on 01/15/2025

Wednesday, January 15, 2025

New blogs from Syncfusion Introducing the New .NET MAUI Bottom Sheet Control By Naveenkumar Sanjeevirayan This blog explains the features of the Bottom Sheet control introduced in the Syncfusion .NET

The Sequence Engineering #469: Llama.cpp is The Framework for High Performce LLM Inference

Wednesday, January 15, 2025

One of the most popular inference framework for LLM apps that care about performance. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

3 Actively Exploited Zero-Day Flaws Patched in Microsoft's Latest Security Update

Wednesday, January 15, 2025

THN Daily Updates Newsletter cover The Kubernetes Book: Navigate the world of Kubernetes with expertise , Second Edition ($39.99 Value) FREE for a Limited Time Containers transformed how we package and