Architecture Weekly - Distributed Locking: A Practical Guide
Was your data ever mysteriously overwritten? No? Think again. Have you noticed conflicting updates to the same data? Still nope? Lucky you! I had cases when my data changed, and it looked like Marty was travelling back to the future. Once it was correct, then it was wrong, then correct again. Most of the time, twin services were writing to the sample place. In distributed systems, coordination is crucial. Whenever you have multiple processes (or services) that could update or read the same data simultaneously, you risk data corruption, race conditions, or unwanted duplicates. The popular solution is to use distributed locks - a mechanism ensuring only one process can operate on a resource at a time. A distributed lock ensures that if one actor (node, service instance, etc.) changes a shared resource—like a database record, file, or external service—no other node can step in until the first node is finished. While the concept is straightforward, implementing it across multiple machines requires careful design and a robust failure strategy. Today, we’ll try to discuss it and:
By the end, you should have a decent grasp of distributed locks, enough to make informed decisions about whether (and how) to use them in your architecture. 1. Why Distributed Locks MatterWhen you scale an application to multiple machines or microservices, each one might update the same resource simultaneously. That can lead to writers overwriting each other. It’s the classic concurrency problem: who’s in charge of updating shared data? Without a strategy to coordinate these updates, you risk inconsistent results. For example, you might have:
A distributed lock is the simplest way to say, “Only one node can modify this resource until it’s finished.” Other nodes wait or fail immediately rather than attempting parallel writes. It can also be seen as Leader Election Lite: If you only need “one node at a time” for a particular task, a plain lock is sometimes enough - there is no need for a full-blown leader election framework. Without locks, you can get unpredictable states - like a read model flipping from correct to incorrect or a file partially overwritten by multiple workers. Locks sacrifice a bit of parallelism for the certainty that no two nodes update the same resource simultaneously. In many cases, that’s the safest trade-off, especially if data correctness is paramount. There are other ways to handle concurrency (like idempotent actions or write-ahead logs). Still, a distributed lock is often the most direct solution when you truly need to avoid simultaneous writes. 2. How Locks Typically Work
The basic flow would look like: And the acquisition part with TTL handling: 3. Tools for Distributed LockingThere are many tools for distributed locking; let's check the most popular for certain categories.
Distributed locks all share a common goal: ensure only one node does a particular thing at any given time. However, each tool mentioned approaches the problem with distinct designs, strengths, and failover behaviours. Let’s look at each tool’s big-picture purpose—why you’d even consider it—then move on to how it implements (or approximates) a lock. Lastly, let’s discuss a few technical details that matter once you start coding or troubleshooting... Continue reading this post for free in the Substack app |
Older messages
On getting the meaningful discussions, and why that's important
Thursday, December 19, 2024
To put our design into practice, we need to be able to persuade our colleagues, stakeholders, and other peers. Without the ability to explain and persuade, even the best design will not be applied. And
The Write-Ahead Log: The underrated Reliability Foundation for Databases and Distributed systems
Tuesday, December 10, 2024
The write-ahead log (WAL) is everywhere. Yet, many people miss it and are not aware of it. The simple idea powers reliability in databases, messaging systems, and distributed systems. Let's discuss
Applying Observability: From Strategy to Practice with Hazel Weakly
Monday, December 2, 2024
Have you considered applying observability but struggled to match the strategy with the tooling? Or maybe you were lost on how to do it? I have something for you!I had a great discussion with Hazel
Get full 30-day access to all Architecture Weekly content for free
Friday, November 29, 2024
Hey! Thank you for being a subscriber; that's much appreciated. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Deduplication in Distributed Systems: Myths, Realities, and Practical Solutions
Monday, November 25, 2024
This week, we'll discuss the deduplication strategies. We'll see whether they're useful and consider scenarios where you may need them. We'll also do a reality check with the promises
You Might Also Like
Ranked | The Most Satisfying vs. Most Reliable Car Brands in 2024 🚙
Monday, December 23, 2024
The most reliable car brands are rarely the most satisfying to own, according to recent Consumer Reports survey data. View Online | Subscribe | Download Our App Presented by: Find the megatrends
Bitcoin Enthusiasts Are Letting Altcoins Pass by
Monday, December 23, 2024
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, December 23, 2024? The
Last Minute Gifts from Walmart
Monday, December 23, 2024
ZDNET ZDNET Sponsored Message In Partnership with Walmart December 23, 2024 exclusive offer Walmart Last-minute gifts from Walmart Shop Now Walmart The tech you've been wishing for–at everyday low
15 ways AI saved me weeks of work in 2024
Monday, December 23, 2024
ZDNET's product of the year; Windows 11 24H2 bug list updated -- ZDNET ZDNET Tech Today - US December 23, 2024 AI applications on various devices. 15 surprising ways I used AI to save me weeks of
⚡ THN Weekly Recap: Top Cybersecurity Threats, Tools and Tips
Monday, December 23, 2024
Your one-stop-source for last week's top cybersecurity headlines. The Hacker News THN Weekly Recap The online world never takes a break, and this week shows why. From ransomware creators being
⚙️ OpenA(G)I?
Monday, December 23, 2024
Plus: The Genesis Project
Post from Syncfusion Blogs on 12/23/2024
Monday, December 23, 2024
New blogs from Syncfusion Introducing the New WinUI Kanban Board By Karthick Mani This blog explains the features of the new Syncfusion WinUI Kanban Board control introduced in the 2024 Volume 4
Import AI 395: AI and energy demand; distributed training via DeMo; and Phi-4
Monday, December 23, 2024
What might fighting for freedom in an AI age look like? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
LockBit Ransomware Developer Charged for Billions in Global Damages
Monday, December 23, 2024
THN Daily Updates Newsletter cover The Data Science Handbook, 2nd Edition ($60.00 Value) FREE for a Limited Time Practical, accessible guide to becoming a data scientist, updated to include the latest