SRE Weekly - SRE Weekly Issue #437
View on sreweekly.com
This week's issue is entirely focused on the CrowdStrike incident: more details on what happened, analysis, and learnings. I'll be back next week with a selection of all of the great stuff you folks have been writing while I've been off on vacation for the past two weeks—my RSS reader is packed with awesomeness!
This week, CrowdStrike posted quite a bit more detail about what happened on July 19. The short of it seems to be an argument count mismatch, but as with any incident of this sort, there are multiple contributing factors.
The report also continues the conversation about the use of kernel mode in a product such as this, amounting to a public conversation with Microsoft that is intriguing to watch from the outside.
CrowdStrike
This article has some interesting details about antitrust regulations(!) related to security vendors running code in kernel mode. There's also an intriguing story of a very similar crash on Linux endpoints running CrowdStrike's Falcon.
Note: this one is from a couple of weeks ago and some of its conjectures don't quite line up with details that have been released in the interim.
Gergely Orosz
While it mentions the CrowdStrike incident only in vague terms, this article discusses why slowly rolling out updates isn't a universal solution and can bring its own problems.
Chris Siebenmann
Some thoughts on staged rollouts and the CrowdStrike outage:
The notion we tried to get known far and wide was "nothing goes everywhere at once".
Note that this post was published before CrowdStrike's RCA which subsequently confirmed that their channel file updates were not deployed through staged rollouts.
rachelbythebay
[...] there may be risks in your system that haven’t manifested as minor outages.
Jumping off from the CrowdStrike incident, this one asks us to look for reliability problems in parts of our infrastructure that we've grown to trust.
Lorin Hochstein
While CrowdStrike's RCA has quite a bit of technical detail, this post reminds us that we need a lot more context to really understand how an incident came to be.
Lorin Hochstein
In the future, computers will not crash due to bad software updates, even those updates that involve kernel code. In the future, these updates will push eBPF code.
I didn't realize that Microsoft is working on eBPF for Windows.
Brendan Gregg
This post isn't about what Crowdstrike should have done. Instead, I use the resources to provide context and takeaways we can apply to our teams and organizations.
Bob Walker — Octopus Deploy
|
Older messages
SRE Weekly Issue #436
Monday, August 5, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of
SRE Weekly Issue #435
Monday, July 29, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: We've gone all out on our new integration with Microsoft Teams. If you're a MS Teams user, FireHydrant now supports the most
SRE Weekly Issue #434
Monday, July 22, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: We've gone all out on our new integration with Microsoft Teams. If you're a MS Teams user, FireHydrant now supports the most
SRE Weekly Issue #433
Monday, July 15, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: We've gone all out on our new integration with Microsoft Teams. If you're a MS Teams user, FireHydrant now supports the most
SRE Weekly Issue #432
Monday, July 8, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: We've gone all out on our new integration with Microsoft Teams. If you're a MS Teams user, FireHydrant now supports the most
You Might Also Like
🎉 Black Friday Early Access: 50% OFF
Monday, November 25, 2024
Black Friday discount is now live! Do you want to master Clean Architecture? Only this week, access the 50% Black Friday discount. Here's what's inside: 7+ hours of lessons .NET Aspire coming
Open Pull Request #59
Monday, November 25, 2024
LightRAG, anything-llm, llm, transformers.js and an Intro to monads for software devs ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Last chance to register: SecOps made smarter
Monday, November 25, 2024
Don't miss this opportunity to learn how gen AI can transform your security workflowsㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤ elastic | Search. Observe. Protect
SRE Weekly Issue #452
Monday, November 25, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-
Corporate Casserole 🥘
Monday, November 25, 2024
How marketing and lobbying inspired Thanksgiving traditions. Here's a version for your browser. Hunting for the end of the long tail • November 24, 2024 Hey all, Ernie here with a classic
WP Weekly 221 - Bluesky - WP Assets on CDN, Limit Font Subsets, ACF Pro Now
Monday, November 25, 2024
Read on Website WP Weekly 221 / Bluesky Have you joined Bluesky, like many other WordPress users, a new place for an online social presence? Also in this issue: CrawlWP, Asset Management Framework,
🤳🏻 We Need More High-End Small Phones — Linux Terminal Setup Tips
Sunday, November 24, 2024
Also: Why I Switched From Google Maps to Apple Maps, and More! How-To Geek Logo November 24, 2024 Did You Know Medieval moats didn't just protect castles from invaders approaching over land, but
JSK Daily for Nov 24, 2024
Sunday, November 24, 2024
JSK Daily for Nov 24, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
OpenAI's turbulent early years - Sync #494
Sunday, November 24, 2024
Plus: Anthropic and xAI raise billions of dollars; can a fluffy robot replace a living pet; Chinese reasoning model DeepSeek R1; robot-dog runs full marathon; a $12000 surgery to change eye colour ͏ ͏
Daily Coding Problem: Problem #1618 [Easy]
Sunday, November 24, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Zillow. Let's define a "sevenish" number to be one which is either a power