How a Kafka-Like Producer Writes to Disk
Imagine you’re sending a message to Kafka by calling something simple like:
We often treat this like a “black box”. We put messages on one side and get them on the other. The message leaves the producer, goes through the broker, and eventually appears in a consumer. This sounds straightforward, but behind the scenes, the technical implementation is a bit more complex. Kafka uses an append-only log for each partition, storing messages in files on disk. We discussed that in detail in The Write-Ahead Log: The underrated Reliability Foundation for Databases and Distributed systems. Thanks to that, if the process crashes mid-write, Kafka detects partial data (via checksums) and discards it upon restart. As I got positive feedback on mixing the pseudocode (no-offence TypeScript!) with the concept explanation, let’s try to show that flow today! Of course, we won’t replicate all real Kafka complexities (replication, huge batch format, time-based files rolling, etc.), but we try to be close enough logically to explain it and get closer to the backbone. By the end, we’ll have:
We’ll also discuss why each piece exists and how that gives you a closer look at tooling internals. If you’re not into Kafka, that’s fine. This article can help you understand how other messaging tools are using disk, WAL, to keep their guarantees! Before we jump into the topic, a short sidetrack. Or, actually, two. First, I invite you to join my online workshop, Practical Introduction to Event Sourcing. I think you got a dedicated email about it, so let me just link here to the page with details and a special 10% discount for you. It’s available through this link: https://ti.to/on3/dddacademy/discount/Oskar. Be quick, as the workshop will happen in precisely 2 weeks! Secondly, we just released the stable version of the MongoDB event store in Emmett. I wrote a detailed article explaining how we did it and how you can do it. Since you’re here, you’ll surely like such nerd sniping. See: https://event-driven.io/en/mongodb_event_store/ Making it consistent and performant was challenging, so I think that's an interesting read. If you're considering using key-value databases like DynamoDB and CosmosDB, this article can outline the challenges and solutions. My first choice is still on PostgreSQL, but I'm happy with the MongoDB implementation we came up with. If MongoDB is already part of your tech stack and the constraints outlined in the article are not deal-breakers, this approach can deliver a pragmatic, production-friendly solution that balances performance, simplicity, and developer familiarity. Ok, going back to our Kafka thing! Producer Batching: The First StepWhen your code calls producer.send, real Kafka doesn’t instantly push that single message to the broker. Instead, it accumulates messages into batches to reduce overhead. For example, if batch.size is set to 16 KB, Kafka’s producer library tries to fill up to 16 KB of messages for a particular partition or wait until the time defined in linger.ms it’s not full, so before sending them as one record batch, this drastically improves throughput, though it can add slight latency. Below is a pseudocode that demonstrates why we do batching at all—not storing anything on disk or network, but collecting messages until we decide to flush:
In real Kafka, we’d have compression, partitioner logic, etc. But the concept stands: accumulate messages → send them in bigger chunks. Brokers are responsible for coordinating the data transfer between producer and consumers and ensuring that data is stored durable on disk. This is important for “under the hood” log writes because the broker typically writes entire batches, possibly compressed, to disk in a single append. That’s one of the essential things to know about why Kafka is performant. After the message is sent to the broker, it’s just stored in the log and transferred to consumers. No additional logic happens. As explained in the article about WAL. Kafka follows the classical WAL pattern:
Single File Append: The Simplest Broker-Side ImplementationIf we were to implement the broker side in a naive manner, we could keep a single file for all messages. Whenever a batch arrives, we append it to the end of that file, storing it in the following format:
Where:
Using Node.js fs (File System) built-in library, we could code the basic append to log logic as:... Continue reading this post for free in the Substack app |
Older messages
Invitation to the Event Sourcing workshop
Friday, January 10, 2025
Hey! I'm usually not making New Year commitments. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Thoughts on Platforms, Core Teams, DORA Report and all that jazz
Monday, January 6, 2025
Everyone's hyping “platform teams” like they're the next big thing—yet I see so many struggling, often for the same reasons core teams do. In latest edition I dive into why these big, central
Locks, Queues and business workflows processing
Monday, December 30, 2024
Last week, we discussed Distributed Locking. Today, we'll continue with it but doing it differently: with a full backflip. We'll see how and why to implement locks with queuing. Then we'll
Distributed Locking: A Practical Guide
Monday, December 23, 2024
If you're wondering how and when distributed locking can be useful, here's the practical guide. I explained why distributed locking is needed in real-world scenarios. Explored how popular tools
On getting the meaningful discussions, and why that's important
Thursday, December 19, 2024
To put our design into practice, we need to be able to persuade our colleagues, stakeholders, and other peers. Without the ability to explain and persuade, even the best design will not be applied. And
You Might Also Like
BetterDev #273 - Operating System in 1,000 Lines
Monday, January 13, 2025
Better Dev #273 Jan 12, 2025 Hi all, Happy new year. Welcome to the first issue of 2025. I'm trying to become more regular this year. Looking forward to a new year and hope everyone continue to
Daily Coding Problem: Problem #1667 [Hard]
Monday, January 13, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Airbnb. We're given a hashmap associating each courseId key with a list of courseIds
🧠 Are Supercomputers Dead? — This 90s Tech Is Perfect for Smart TVs
Monday, January 13, 2025
Also: How to Make Sense of Linux Ping Stats, and More! How-To Geek Logo January 13, 2025 Did You Know The original name of the iconic SR-71 Blackbird was actually the RS-71 Blackbird, but Lyndon
Consistency means nothing & Bluesky is reportedly valued at $700
Monday, January 13, 2025
Sill Beta Update #3, Miro AI starts storing AI interactions from free users, Mastodon transfers to a new non-profit organization, and a lot more in this week's issue of Creativerly. Creativerly
Ranked | The AI Models With the Lowest Hallucination Rates 🤖
Monday, January 13, 2025
Hallucination rate is the frequency that an LLM generates false or unsupported information in its outputs. Which models have the lowest rates? View Online | Subscribe | Download Our App FEATURED STORY
GCP Newsletter #433
Monday, January 13, 2025
Welcome to issue #433 January 13th, 2025 News Official Blog Vertex AI Introducing Vertex AI RAG Engine: Scale your Vertex AI RAG pipeline with confidence - Vertex AI RAG Engine is a fully managed
Spyglass Dispatch: It's Political & Personal
Monday, January 13, 2025
On Meta's Moderation Changes • Inside DOGE • Zuck Slams Apple (Again) • Apple's Muted 2025 • CES 2025 Recap The Spyglass Dispatch is a newsletter sent on weekdays featuring links and commentary
$200 to invest today... (USA Only)
Monday, January 13, 2025
Join me in investing in blue chip art on Masterworks, and you will receive $200 to invest on the platform. Not kidding. Founder interview coming soon! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Knowledge #468: A New Series About RAG
Monday, January 13, 2025
Exploring key concepts of one of the most popular methods in generative AI solutions. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏