SRE Weekly - SRE Weekly Issue #273
Articles
What indeed? It depends on who you ask.
Quentin Rousseau — Rootly
This academic paper explains Google’s efforts toward identifying “mercurial” CPU coores — cores that make erroneous computations.
[…] we observe on the order of a few mercurial cores per several thousand machines […]
This one blew my mind:
A deterministic AES mis-computation, which was “selfinverting”: encrypting and decrypting on the same core yielded the identity function, but decryption elsewhere yielded gibberish.
Peter H. Hochschild, Paul Turner, Jeffrey C. Mogul, Rama Govindaraju, Parthasarathy Ranganathan, David E. Culler, and Amin Vahdat — Google
The decisions, non-decisions, and workarounds that we implement now can have lasting effects on the Internet as a whole.
Mark Nottingham — Fastly
Full disclosure: Fastly is my employer.
A great intro to the topic of resilience engineering. Hint: resilience !=
high availability.
Piet van Dongen — Luminis Arnhem
When you include people in your definition of “the system”, something that looked like a system failure where humans had to “step in” is actually a success in which the system adapted.
Lorin Hochstein
I find the way this author presented this argument especially convincing. My favorite part is the real-world story toward the end.
Rachel by the Bay
Facebook presents their method for finding and dealing with PCIe errors in their infrastructure.
Ashwin Poojary, Bill Holland, Makan Diarra, and Ray Park — Facebook
Overflow of a 32-bit integer primary key caused a security issue.
Scott Sanders — GitHub
This caught my eye. I’ve seldom been in an on-call rotation with shifts that were not a week or two at a time.
The optimal frequency for being on call is about three days a month.
There’s also a good discussion of paying for on-call shifts, which, in my experience, goes a long way toward making on-call more palatable.
Christine Patton — SoundCloud
Outages
- HBO Max
- Apple Card
- Sling TV
- Google Meet
- GitHub
- Discord
- Discord had several outages this week.
|
Older messages
SRE Weekly Issue #272
Monday, May 31, 2021
View on sreweekly.com A message from our sponsor, StackHawk: See how automated security testing can change how your teams find and fix security vulnerabilities. http://sthwk.com/security-automation
SRE Weekly Issue #271
Monday, May 24, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join StackHawk on Tuesday, May 25 for a hands-on authenticated security testing workshop. Follow along as we walk through three common
SRE Weekly Issue #270
Monday, May 17, 2021
View on sreweekly.com A message from our sponsor, StackHawk: APIs are not only the backbone of modern application architecture, but they are also a key part of security. Discover what API security
SRE Weekly Issue #269
Monday, May 10, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Tune into ZAPCon After Hours this Tuesday at 8 am PT to learn how to include automated security testing in your builds with ZAP http://sthwk
SRE Weekly Issue #268
Monday, May 3, 2021
View on sreweekly.com A message from our sponsor, StackHawk: Join StackHawk Tuesday May 4 at 9 am PT for a hands-on technical workshop! By the end of the session, you will have three types of security
You Might Also Like
Practical Introduction to Event Sourcing with Emmett
Monday, March 10, 2025
Emmett is a framework that will take your applications back to the future. Learn mor on how Event Sourcing can be practical and smoother with it.The idea behind Emmett was to make it easier to create
WP Weekly 233 - Themes - Offline AI+WP, Trademarks Done, 50K Users in 34 Days
Monday, March 10, 2025
Read on Website WP Weekly 233 / Themes Building new Themes without built-in audience is tough, reveals study. Managed WordPress and Hosted WordPress trademarks acquired. Also in this issue, brand new
SRE Weekly Issue #467
Monday, March 10, 2025
View on sreweekly.com A message from our sponsor, incident.io: SEV0 is back. This fall, we're bringing together the best minds in incident management for a day of learning, sharing, and networking
Where’s Apple Intelligence? - Sync #509
Sunday, March 9, 2025
Plus: Musk vs OpenAI trial set for expedited trial this year; scientists create woolly mice; an android with artificial muscles; another dancing humanoid robot; how to make superbabies; and more! ͏ ͏ ͏
CD#547 Writing a .NET profiler in C#
Sunday, March 9, 2025
CPU profiler for .NET using Silhouette ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
RD#496 Signals in React?
Sunday, March 9, 2025
Not a good idea according to Filipe ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
PD#616 Bloom Filter: A Deep Dive
Sunday, March 9, 2025
How Bloom filters are useful in scenarios with memory constraints ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Daily Coding Problem: Problem #1713 [Hard]
Sunday, March 9, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Netflix. Implement a queue using a set of fixed-length arrays. The queue should support
Netflix codes/Travel Adapter/Real China
Sunday, March 9, 2025
Recomendo - issue #453 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Sunday Digest | Featuring 'The 15 Largest Defense Budgets in the World' 📊
Sunday, March 9, 2025
Every visualization published this week, in one place. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏