How a Kafka-Like Producer Writes to Disk
Imagine you’re sending a message to Kafka by calling something simple like:
We often treat this like a “black box”. We put messages on one side and get them on the other. The message leaves the producer, goes through the broker, and eventually appears in a consumer. This sounds straightforward, but behind the scenes, the technical implementation is a bit more complex. Kafka uses an append-only log for each partition, storing messages in files on disk. We discussed that in detail in The Write-Ahead Log: The underrated Reliability Foundation for Databases and Distributed systems. Thanks to that, if the process crashes mid-write, Kafka detects partial data (via checksums) and discards it upon restart. As I got positive feedback on mixing the pseudocode (no-offence TypeScript!) with the concept explanation, let’s try to show that flow today! Of course, we won’t replicate all real Kafka complexities (replication, huge batch format, time-based files rolling, etc.), but we try to be close enough logically to explain it and get closer to the backbone. By the end, we’ll have:
We’ll also discuss why each piece exists and how that gives you a closer look at tooling internals. If you’re not into Kafka, that’s fine. This article can help you understand how other messaging tools are using disk, WAL, to keep their guarantees! Before we jump into the topic, a short sidetrack. Or, actually, two. First, I invite you to join my online workshop, Practical Introduction to Event Sourcing. I think you got a dedicated email about it, so let me just link here to the page with details and a special 10% discount for you. It’s available through this link: https://ti.to/on3/dddacademy/discount/Oskar. Be quick, as the workshop will happen in precisely 2 weeks! Secondly, we just released the stable version of the MongoDB event store in Emmett. I wrote a detailed article explaining how we did it and how you can do it. Since you’re here, you’ll surely like such nerd sniping. See: https://event-driven.io/en/mongodb_event_store/ Making it consistent and performant was challenging, so I think that's an interesting read. If you're considering using key-value databases like DynamoDB and CosmosDB, this article can outline the challenges and solutions. My first choice is still on PostgreSQL, but I'm happy with the MongoDB implementation we came up with. If MongoDB is already part of your tech stack and the constraints outlined in the article are not deal-breakers, this approach can deliver a pragmatic, production-friendly solution that balances performance, simplicity, and developer familiarity. Ok, going back to our Kafka thing! Producer Batching: The First StepWhen your code calls producer.send, real Kafka doesn’t instantly push that single message to the broker. Instead, it accumulates messages into batches to reduce overhead. For example, if batch.size is set to 16 KB, Kafka’s producer library tries to fill up to 16 KB of messages for a particular partition or wait until the time defined in linger.ms it’s not full, so before sending them as one record batch, this drastically improves throughput, though it can add slight latency. Below is a pseudocode that demonstrates why we do batching at all—not storing anything on disk or network, but collecting messages until we decide to flush:
In real Kafka, we’d have compression, partitioner logic, etc. But the concept stands: accumulate messages → send them in bigger chunks. Brokers are responsible for coordinating the data transfer between producer and consumers and ensuring that data is stored durable on disk. This is important for “under the hood” log writes because the broker typically writes entire batches, possibly compressed, to disk in a single append. That’s one of the essential things to know about why Kafka is performant. After the message is sent to the broker, it’s just stored in the log and transferred to consumers. No additional logic happens. As explained in the article about WAL. Kafka follows the classical WAL pattern:
Single File Append: The Simplest Broker-Side ImplementationIf we were to implement the broker side in a naive manner, we could keep a single file for all messages. Whenever a batch arrives, we append it to the end of that file, storing it in the following format:
Where:
Using Node.js fs (File System) built-in library, we could code the basic append to log logic as:... Continue reading this post for free in the Substack app |
Older messages
Invitation to the Event Sourcing workshop
Friday, January 10, 2025
Hey! I'm usually not making New Year commitments. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Thoughts on Platforms, Core Teams, DORA Report and all that jazz
Monday, January 6, 2025
Everyone's hyping “platform teams” like they're the next big thing—yet I see so many struggling, often for the same reasons core teams do. In latest edition I dive into why these big, central
Locks, Queues and business workflows processing
Monday, December 30, 2024
Last week, we discussed Distributed Locking. Today, we'll continue with it but doing it differently: with a full backflip. We'll see how and why to implement locks with queuing. Then we'll
Distributed Locking: A Practical Guide
Monday, December 23, 2024
If you're wondering how and when distributed locking can be useful, here's the practical guide. I explained why distributed locking is needed in real-world scenarios. Explored how popular tools
On getting the meaningful discussions, and why that's important
Thursday, December 19, 2024
To put our design into practice, we need to be able to persuade our colleagues, stakeholders, and other peers. Without the ability to explain and persuade, even the best design will not be applied. And
You Might Also Like
Data Science Weekly - Issue 589
Friday, March 7, 2025
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
📱 Issue 453 - Does iOS have sideloading yet?
Thursday, March 6, 2025
This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 453 Release Date Mar 06, 2025 Your weekly report of the most popular iOS news, articles and projects Popular
💻 Issue 452 - Pro .NET Memory Management 2nd Edition
Thursday, March 6, 2025
This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 452 Release Date Mar 06, 2025 Your weekly report of the most popular .NET news, articles and projects
💎 Issue 459 - What's the Deal with (Ruby) Ractors?
Thursday, March 6, 2025
This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 459 Release Date Mar 06, 2025 Your weekly report of the most popular Ruby news, articles and
💻 Issue 459 - 7 Best Practices of File Upload With JavaScript
Thursday, March 6, 2025
This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 459 Release Date Mar 06, 2025 Your weekly report of the most popular Node.js news, articles and
💻 Issue 459 - TanStack Form V1 - Type-safe, Agnostic, Headless Form Library
Thursday, March 6, 2025
This week's Awesome JavaScript Weekly Read this email on the Web The Awesome JavaScript Weekly Issue » 459 Release Date Mar 06, 2025 Your weekly report of the most popular JavaScript news, articles
💻 Issue 454 - Take a break: Rust match has fallthrough
Thursday, March 6, 2025
This week's Awesome Rust Weekly Read this email on the Web The Awesome Rust Weekly Issue » 454 Release Date Mar 06, 2025 Your weekly report of the most popular Rust news, articles and projects
💻 Issue 377 - TanStack Form V1 - Type-safe, Agnostic, Headless Form Library
Thursday, March 6, 2025
This week's Awesome React Weekly Read this email on the Web The Awesome React Weekly Issue » 377 Release Date Mar 06, 2025 Your weekly report of the most popular React news, articles and projects
📱 Issue 456 - Safer Swift: How ~Copyable Prevents Hidden Bugs
Thursday, March 6, 2025
This week's Awesome Swift Weekly Read this email on the Web The Awesome Swift Weekly Issue » 456 Release Date Mar 06, 2025 Your weekly report of the most popular Swift news, articles and projects
JSK Daily for Mar 6, 2025
Thursday, March 6, 2025
JSK Daily for Mar 6, 2025 View this email in your browser A community curated daily e-mail of JavaScript news Build a Dynamic Watchlist for Your Web App with Angular & GraphQL (Part 6) In this