Ordering, Grouping and Consistency in Messaging systems
We went quite far from our Queue Broker series in recent editions, but today, we’re back to it! Did you miss it? In an initial article, we introduced the QueueBroker—a central component that manages queuing, backpressure, and concurrency. In distributed systems, queues do more than buffer requests or smooth traffic spikes. They can ensure order, enforce grouping, and provide guarantees for exclusive processing. These properties are critical when designing systems that must be both scalable and consistent. On Friday, I wrote an extensive guide on how to handle idempotency in command handling on my blog. I explained the challenges in ensuring that our business operations are handled correctly without duplicates. I went through the various ways to handle it, both explicitly and generically, from building predictable state machines to handling it in business logic to leveraging Optimistic Concurrency, retries, and distributed locks. I recommend handling idempotency using business logic, as it’s highly dependent on the business rules. And those rules tend to change. If we use generic handling, we’re always adding additional overhead, even if most of our applications rarely have idempotency issues. That’s something to consider, as it can also increase costs in cloud environments. Such implementations are also tricky. We need to handle thread safety and atomic updates correctly. I showed the implementation using a distributed lock like Redis, as in-memory implementation isn’t that simple. And “in-memory implementation isn’t that simple” is a good challenge. And I’ll take it today! It’s a good opportunity to merge ideas from those two articles. We’ll build a thread-safe, in-memory IdempotencyKey Store by combining single-writer, grouped queuing with idempotency guarantees. We’ll also explore the broader architectural principles behind grouping and ordering in queuing systems like Amazon SQS FIFO, Kafka, and Azure Service Bus. But first, let’s start with the problem grouping solves. Why Grouping MattersIn distributed systems, operations in a shared state—whether it’s a bank account, an order, or a device—are often processed asynchronously. But asynchronous processing introduces risk:
Grouping allows related operations to be processed sequentially while operations on different groups run concurrently. We’re correlating a set of tasks, telling that they need to be processed in the certain order: For example:
Grouping isn’t a new idea. Systems like Amazon SQS FIFO, Azure Service Bus, and Kafka have implemented variations of this pattern to solve similar problems. How Popular Systems Handle Grouping and OrderingAmazon SQS FIFORegular Amazon SQS doesn’t provide any ordering guarantee. Still, some years ago, AWS added a variation: Amazon SQS FIFO (First-In-First-Out). Amazon SQS FIFO ensures messages with the same message group ID are processed in the right order. Groups are independent, so tasks for one group don’t block tasks in another. For example, in an e-commerce system, group messages by orderId to ensure updates for the same order (e.g. “Order Placed”, “Payment Confirmed”) are processed sequentially. Messages for different orders can be processed concurrently. Of course, ordering guarantees come at the cost of throughput: FIFO queues process fewer messages per second than standard queues. We may be also facing bottlenecks in group processing. A slow task in one group delays all subsequent tasks in that group. Azure Service BusAzure Service Bus uses sessions to group related messages. Messages in a session are processed sequentially, similar to SQS FIFO, but with added features like session locks to ensure only one consumer handles a session at a time. Session locks make Azure Service Bus robust against failures, but managing them adds complexity. Throughput can be impacted if too many messages are grouped into the same session. KafkaKafka achieves grouping through partitioning. Each partition processes events sequentially, and partitions can be assigned to consumers for parallel processing. Partitioning requires careful design. Partitions represent a physical split. Poor partitioning can lead to uneven workload distribution. You can also have a single consumer inside the consumer group to handle those messages, but you cannot distribute them. When consumers fail, or new consumers join, partitions must be reassigned, temporarily disrupting processing. RabbitMQRabbitMQ ensures message ordering within a queue. Messages are delivered in the same order they are enqueued. However, once multiple consumers subscribe to the queue, RabbitMQ balances messages across consumers in a round-robin fashion, potentially disrupting the ordering. RabbitMQ doesn't natively support grouping like SQS FIFO or Azure Service Bus sessions. Implementing message grouping typically requires creating additional queues or custom routing logic with exchanges. To achieve that you could:
While creating separate queues for each group ensures strict ordering, it increases operational complexity. It can lead to scaling issues. Managing thousands of queues for a high-cardinality grouping key like userId can overwhelm RabbitMQ. It also causes increased resource usage. Each queue consumes memory and CPU, leading to overhead in large-scale systems. Guaranteeing idempotency with queueThe generic implementation of the idempotency handling looks as follows. We:
In code it can look as follows:
The store can be defined as:
The store has the following methods:
What’s most important is that those operations should be thread-safe and atomic. Without that, we may face race conditions. The safest option is to use a distributed lock, e.g., Redis or a relational database. But we’re not here today to choose the safest options but to learn! What if we had an implementation that ensures that we schedule the processing of a certain task one at a time? Actually, we have it from our older edition. Let’s join those forces together! We’ll use a queue broker single-writer capabilities to ensure that only a single check for the existence of a specific idempotency key will be made in parallel... Continue reading this post for free in the Substack app |
Older messages
Building your own Ledger Database
Monday, November 11, 2024
Today we discussed a challenge of replacing Amazon Quantum Ledger Database raised by Architecture Weekly community member. The surprising recommendation was to built your own Ledger Database. Why? Am I
Tech Debt doesn't exist, but trade-offs do
Monday, November 4, 2024
Tech debt is deader than dead, shock is all in your head. At least I'm shocked that after 32 years we're still using this term. I discussed today why I consider Tech Debt metaphore harmful, why
Frontent Architecture, Backend Architecture or just Architecture? With Tomasz Ducin
Monday, October 28, 2024
What's more important Frontend or Backend? What is Frontend Architecture? Is it even a thing? Where to draw the line, what are the common challenges in Frontend world? How do we shape our teams:
Don't Oversell Ideas: Trunk-Based Development Edition
Monday, October 21, 2024
We're living in the kiss-kiss-bang-bang era. Answers have to be quick, solutions simple, takes hot. One of the common leitmotifs that I see in my bubble is "just do trunk-based development
Why to measure and make our system observable? How to reason on chaotic world
Sunday, October 20, 2024
The world is messy and chaotic, who knew? Embracing that hard fact can bring relief, and be a first step to understanding how to handle known knowns, unknown unknowns and all that jazz. Today I
You Might Also Like
GCP Newsletter #424
Monday, November 18, 2024
Welcome to issue #425 November 18th, 2024 News Google Kubernetes Engine Official Blog 65000 nodes and counting: Google Kubernetes Engine is ready for trillion-parameter AI models - Google Kubernetes
Design and code beautiful products. Together.
Monday, November 18, 2024
Pablo Ruiz-Múzquiz and the team at Penpot have recently announced a new plugin feature that allows users to build new tools and functionalities on the platform. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Can Bitcoin Put an End to Forever War?
Monday, November 18, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 18, 2024? The HackerNoon
25 tips for programming with AI
Monday, November 18, 2024
Meta Quest dominates Steam VR; Stop squirting hot glue into devices -- ZDNET ZDNET Tech Today - US November 18, 2024 digitalspeed-gettyimages-1322205545 25 AI tips to boost your programming
⚡ THN Recap: Top Cybersecurity Threats, Tools, and Practices (Nov 11 - Nov 17)
Monday, November 18, 2024
Ready to outsmart the hackers? 👇 Dive into this week's must-know updates.
Import AI 392: China releases another excellent coding model; generative models and robots; scaling laws for agent…
Monday, November 18, 2024
If aliens built AI, would it also use stochastic gradient descent? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
⚙️ Musk's $6 billion
Monday, November 18, 2024
Plus: We chat with an AI venture capitalist
Post from Syncfusion Blogs on 11/18/2024
Monday, November 18, 2024
New blogs from Syncfusion React vs. Next.js: Choosing the Right Framework By Prashant Yadav Learn the key differences between React and Next.js to choose the right framework for your web development
Gmail's New Shielded Email Feature Lets Users Create Aliases for Email Privacy
Monday, November 18, 2024
THN Daily Updates Newsletter cover [Watch LIVE] When Shift Happens: Are You Ready for Rapid Certificate Replacement? Revocations can disrupt your business, but automation saves the day. Discover how.