͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Ordering, Grouping and Consistency in Messaging systems

Oskar Dudycz

Nov 18

∙

Preview

READ IN APP

We went quite far from our Queue Broker series in recent editions, but today, we’re back to it! Did you miss it?

In an initial article, we introduced the QueueBroker—a central component that manages queuing, backpressure, and concurrency. In distributed systems, queues do more than buffer requests or smooth traffic spikes. They can ensure order, enforce grouping, and provide guarantees for exclusive processing. These properties are critical when designing systems that must be both scalable and consistent.

On Friday, I wrote an extensive guide on how to handle idempotency in command handling on my blog. I explained the challenges in ensuring that our business operations are handled correctly without duplicates. I went through the various ways to handle it, both explicitly and generically, from building predictable state machines to handling it in business logic to leveraging Optimistic Concurrency, retries, and distributed locks.

I recommend handling idempotency using business logic, as it’s highly dependent on the business rules. And those rules tend to change. If we use generic handling, we’re always adding additional overhead, even if most of our applications rarely have idempotency issues. That’s something to consider, as it can also increase costs in cloud environments.

Such implementations are also tricky. We need to handle thread safety and atomic updates correctly. I showed the implementation using a distributed lock like Redis, as in-memory implementation isn’t that simple.

And “in-memory implementation isn’t that simple” is a good challenge. And I’ll take it today!

It’s a good opportunity to merge ideas from those two articles. We’ll build a thread-safe, in-memory IdempotencyKey Store by combining single-writer, grouped queuing with idempotency guarantees. We’ll also explore the broader architectural principles behind grouping and ordering in queuing systems like Amazon SQS FIFO, Kafka, and Azure Service Bus.

But first, let’s start with the problem grouping solves.

Why Grouping Matters

In distributed systems, operations in a shared state—whether it’s a bank account, an order, or a device—are often processed asynchronously. But asynchronous processing introduces risk:

Transactions arrive out of order: Imagine a bank processes a withdrawal before a deposit. A user with a $0 balance may see a failed transaction, even though funds were deposited seconds earlier.
Duplicate messages cause inconsistent state: A hotel system receives two commands to check in a guest. Without safeguards, the system may allocate two rooms to the same guest.
Concurrency can break assumptions: In an IoT system, a device sends a “door open” event while another service processes a “door locked” command. Without guarantees, these commands could execute simultaneously, leaving the system in an undefined state.

Grouping allows related operations to be processed sequentially while operations on different groups run concurrently. We’re correlating a set of tasks, telling that they need to be processed in the certain order: For example:

Banking systems group by account: Ensuring withdrawals and deposits for the same account are serialized.
Order management systems group by order: Ensuring updates to the same order (e.g., payment, shipping) are processed in order.
IoT systems group by device: Ensuring events from the same device don’t conflict.

Grouping isn’t a new idea. Systems like Amazon SQS FIFO, Azure Service Bus, and Kafka have implemented variations of this pattern to solve similar problems.

How Popular Systems Handle Grouping and Ordering

Amazon SQS FIFO

Regular Amazon SQS doesn’t provide any ordering guarantee. Still, some years ago, AWS added a variation: Amazon SQS FIFO (First-In-First-Out).

Amazon SQS FIFO ensures messages with the same message group ID are processed in the right order. Groups are independent, so tasks for one group don’t block tasks in another. For example, in an e-commerce system, group messages by orderId to ensure updates for the same order (e.g. “Order Placed”, “Payment Confirmed”) are processed sequentially. Messages for different orders can be processed concurrently.

Of course, ordering guarantees come at the cost of throughput: FIFO queues process fewer messages per second than standard queues. We may be also facing bottlenecks in group processing. A slow task in one group delays all subsequent tasks in that group.

Azure Service Bus

Azure Service Bus uses sessions to group related messages. Messages in a session are processed sequentially, similar to SQS FIFO, but with added features like session locks to ensure only one consumer handles a session at a time.

Session locks make Azure Service Bus robust against failures, but managing them adds complexity. Throughput can be impacted if too many messages are grouped into the same session.

Kafka

Kafka achieves grouping through partitioning. Each partition processes events sequentially, and partitions can be assigned to consumers for parallel processing. Partitioning requires careful design.

Partitions represent a physical split. Poor partitioning can lead to uneven workload distribution. You can also have a single consumer inside the consumer group to handle those messages, but you cannot distribute them. When consumers fail, or new consumers join, partitions must be reassigned, temporarily disrupting processing.

RabbitMQ

RabbitMQ ensures message ordering within a queue. Messages are delivered in the same order they are enqueued. However, once multiple consumers subscribe to the queue, RabbitMQ balances messages across consumers in a round-robin fashion, potentially disrupting the ordering.

RabbitMQ doesn't natively support grouping like SQS FIFO or Azure Service Bus sessions. Implementing message grouping typically requires creating additional queues or custom routing logic with exchanges. To achieve that you could:

Use one queue per group. For instance, in an e-commerce system, you can create a queue for each orderId to process updates like "Order Placed" and "Order Shipped" sequentially.
Alternatively, use consistent hashing to route messages with the same key (e.g., orderId) to the same queue, ensuring grouped ordering.

While creating separate queues for each group ensures strict ordering, it increases operational complexity. It can lead to scaling issues. Managing thousands of queues for a high-cardinality grouping key like userId can overwhelm RabbitMQ. It also causes increased resource usage. Each queue consumes memory and CPU, leading to overhead in large-scale systems.

Guaranteeing idempotency with queue

The generic implementation of the idempotency handling looks as follows. We:

Attempt to acquire the lock for the specified key.
If the attempt wasn’t successful, return the default result.
If we acquire the lock, then run the business logic.
If it fails, release the lock. We don’t want to lock the key, let someone fix the state condition, and retry later.
If it was successful, accept the lock and return the result.

In code it can look as follows:

async function idempotent <T>(
  handler: () => Promise<T>,
  options: IdempotentOptions<T>,
): Promise<T> {
  const { key, timeoutInMs, defaultResult, store } = options;

  //Attempt to acquire the lock for the specified key
  const appended = await store.tryLock(key, { timeoutInMs });

  // If it was used, return the default result
  if (!appended) return await defaultResult();

  try {
    // run handler
    const result = await handler();

    // Mark operation as successful
    await store.accept(key);

    return result;
  } catch (error) {
    // Release key if it was used already
    await store.release(key);
    throw error;
  }
};

type IdempotentOptions<T> = {
  store: IdempotencyKeyStore;
  key: string;
  defaultResult: () => T | Promise<T>;
  timeoutInMs?: number;
};

The store can be defined as:

type IdempotencyKeyStore = {
  tryLock: (
    key: string,
    options?: { timeoutInMs?: number },
  ) => boolean | Promise<boolean>;
  accept: (key: string) => void | Promise<void>;
  release: (key: string) => void | Promise<void>;
};

The store has the following methods:

tryLock - that tries to lock the specified key for a certain period of time. Returns true if the lock was acquired and false otherwise.
accept - accepts the key lock after successful handling.
release - releases the lock in case of handling failure.

What’s most important is that those operations should be thread-safe and atomic. Without that, we may face race conditions. The safest option is to use a distributed lock, e.g., Redis or a relational database.

But we’re not here today to choose the safest options but to learn!

What if we had an implementation that ensures that we schedule the processing of a certain task one at a time? Actually, we have it from our older edition.

Let’s join those forces together!

Captain Planet and the Planeteers imagined a kinder world. What can we do better? | SYFY WIRE

We’ll use a queue broker single-writer capabilities to ensure that only a single check for the existence of a specific idempotency key will be made in parallel...

Continue reading this post for free in the Substack app

Claim my free post

Or upgrade your subscription. Upgrade to paid

Like

Comment

Restack

Building your own Ledger Database

Monday, November 11, 2024

Today we discussed a challenge of replacing Amazon Quantum Ledger Database raised by Architecture Weekly community member. The surprising recommendation was to built your own Ledger Database. Why? Am I

Tech Debt doesn't exist, but trade-offs do

Monday, November 4, 2024

Tech debt is deader than dead, shock is all in your head. At least I'm shocked that after 32 years we're still using this term. I discussed today why I consider Tech Debt metaphore harmful, why

Frontent Architecture, Backend Architecture or just Architecture? With Tomasz Ducin

Monday, October 28, 2024

What's more important Frontend or Backend? What is Frontend Architecture? Is it even a thing? Where to draw the line, what are the common challenges in Frontend world? How do we shape our teams:

Don't Oversell Ideas: Trunk-Based Development Edition

Monday, October 21, 2024

We're living in the kiss-kiss-bang-bang era. Answers have to be quick, solutions simple, takes hot. One of the common leitmotifs that I see in my bubble is "just do trunk-based development

Why to measure and make our system observable? How to reason on chaotic world

Sunday, October 20, 2024

The world is messy and chaotic, who knew? Embracing that hard fact can bring relief, and be a first step to understanding how to handle known knowns, unknown unknowns and all that jazz. Today I

Data Science Weekly - Issue 588

Thursday, February 27, 2025

Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

💎 Issue 458 - Why Ruby on Rails still matters

Thursday, February 27, 2025

This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 458 Release Date Feb 27, 2025 Your weekly report of the most popular Ruby news, articles and

📱 Issue 452 - Three questions about Apple, encryption, and the U.K

Thursday, February 27, 2025

This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 452 Release Date Feb 27, 2025 Your weekly report of the most popular iOS news, articles and projects Popular

💻 Issue 451 - .NET 10 Preview 1 is now available!

Thursday, February 27, 2025

This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 451 Release Date Feb 27, 2025 Your weekly report of the most popular .NET news, articles and projects

💻 Issue 458 - Full Stack Security Essentials: Preventing CSRF, Clickjacking, and Ensuring Content Integrity in JavaScript

Thursday, February 27, 2025

This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 458 Release Date Feb 27, 2025 Your weekly report of the most popular Node.js news, articles and

Ordering, Grouping and Consistency in Messaging systems

Ordering, Grouping and Consistency in Messaging systems

Why Grouping Matters

How Popular Systems Handle Grouping and Ordering

Amazon SQS FIFO

Azure Service Bus

Kafka

RabbitMQ

Guaranteeing idempotency with queue

Continue reading this post for free in the Substack app

Older messages

Building your own Ledger Database

Tech Debt doesn't exist, but trade-offs do

Frontent Architecture, Backend Architecture or just Architecture? With Tomasz Ducin

Don't Oversell Ideas: Trunk-Based Development Edition

Why to measure and make our system observable? How to reason on chaotic world

You Might Also Like

Data Science Weekly - Issue 588

💎 Issue 458 - Why Ruby on Rails still matters

📱 Issue 452 - Three questions about Apple, encryption, and the U.K

💻 Issue 451 - .NET 10 Preview 1 is now available!

💻 Issue 458 - Full Stack Security Essentials: Preventing CSRF, Clickjacking, and Ensuring Content Integrity in JavaScript

💻 Issue 458 - TypeScript types can run DOOM

💻 Issue 453 - Linus Torvalds Clearly Lays Out Linux Maintainer Roles Around Rust Code

💻 Issue 376 - Top 10 React Libraries/Frameworks for 2025 🚀

February 27th 2025

📱 Issue 455 - How Swift's server support powers Things Cloud