Show me the money! Practically navigating the Cloud Costs Complexity
In my last article, I explored how Amazon S3’s new conditional writes feature can be used to implement a strongly consistent event store. I think that this feature opens up powerful architectural possibilities for distributed systems. But as much as we’d love to dive right into the code, there’s a bigger question to answer first: how much will it cost? This article cannot be started in any other way than one of my favourite movie scenes: Show you the money! Not you, show me the money! We’ve all seen cloud bills get out of hand, often because the true infrastructure costs are harder to predict than they seem at first glance. Today, we’ll grab a calculator to discuss the costs of building an event store on S3. I’m not an expert here; there are smarter and more skilled people than me. It might be that you’re one of them. That’s why I’m counting on your feedback, not only the money. I will show you the money and me the money so you can see how I typically calculate, manage, and optimize the costs in the Cloud. That can be food for thought. You’ll see that this is not your typical “pay for storage and requests” breakdown. We’ll examine the specific challenges of request patterns, working set size, DELETE operations, and load spikes and how to make smarter decisions based on AWS pricing models. I’ll use AWS, but similar calculations can and should be done for other cloud providers. Using S3 Conditional Writes for Event StoresBefore we dive into the math equation, an event store is a key-value database where all business operations results are recorded as immutable events. A traditional record is represented as a sequence of events called an event stream. Each operation loads all events from the stream, builds the state in memory and appends a new fact-event. As event stores are databases, they should give strong consistency guarantees like supporting optimistic concurrency. The new conditional writes feature was triggered if S3 can give such guarantees now, and it can! The If-None-Match header support in S3 ensures that only a single event can be appended with a specific stream (record) version. As If-None-Match header works only during the file creation, we need the following naming schema (or similar) to guarantee that:
Effectively, the new event is a new file in the S3 bucket. And S3 is not optimised for such usage, as it favours smaller amounts of bigger files. We discussed how to overcome that. The key design is around active chunks. In this system, new events are written to a chunk (file) that remains active while being appended. A chunk can contain:
Replaying all events from the beginning to rebuild an entity’s state can be costly regarding GET requests (as we’re paying for each request). To reduce this, the design incorporates snapshots, which store a full representation of the current state within a chunk. By fetching the latest snapshot, the system can avoid replaying the entire event history, minimizing the number of GET operations. We also need to pay for PUT operation to append each event. As each chunk will be named with an autoincremented stream version, we must pay for LIST to find the latest one. We also discussed compacting event stream data to reduce storage costs. Chunks can be compacted periodically, meaning old events are merged and unnecessary chunks are deleted, further lowering storage and request costs. Sealed chunks can be deleted or moved to lower-cost storage tiers like S3 Intelligent-Tiering or S3 Glacier, reducing storage costs. This approach leverages S3's conditional writes to ensure consistency and manages costs by strategically using snapshots, chunking, and storage tiers. So ok, how much will it cost? Basic costs calculationsWhen designing an event-sourced system, it’s easy to assume that keeping track of system changes is a cheap and neglectable cost nowadays. You just log some events and store them, right? But what happens when your event payloads grow larger than expected? Suddenly, those small changes to event size can greatly impact your storage costs. Let’s break down three scenarios, starting with a common size of the events, then looking at what happens if things go wrong. 4KB Events: The Lean and Efficient SystemLet’s start with an efficient design. In this system, each event logs essential information about an insurance claim—claim creation, adjustments, approvals, and payments. It includes the necessary details like timestamps, customer info, and claim data. In such a case, 4KB on average should be enough to also keep snapshot and stream metadata (as snapshots should be trimmed to keep only data used in business logic; they don’t need complete information). Here’s what the system could look like:
Costs for 4KB EventsLet’s ignore the free tier and other promotions (for now). The costs could look as follows.
Total Year 1 Costs: 40KB Events: things start to escalateWhile 4KB events are a more reasonable size for most event sourcing systems, it’s possible that due to poor design, additional (meta)data, or overly verbose data formats, events could balloon to 40KB. I wrote about this in Anti-patterns in event modelling - I'll just add one more field. Costs for 40KB Events
Total Year 1 Costs: It’s visible that the request costs stay the same, and that’s a significant power of S3. You don’t pay the transfer cost as long as you stay inside the AWS network. Still, it’s visible that storage costs went ten times higher, going above the request cost. Let’s check the next scenario. 400KB Events: The Worst-Case ScenarioNow, let’s assume something went wrong in the design process. Instead of storing references to large documents (like PDFs or images), someone included the entire file within each event. The event size skyrockets to 400KB—an inefficient bloat. As with S3, math is relatively simple; we can multiply previous results by 10 and get: Total Year 1 Costs: Total Storage: 200GB Now, the storage costs skyrocketed. Optimising storage costsThe key lesson is that storage costs grow with event size, but request costs stay flat. Also, the event number adds to both the request and storage costs. Notice that the request costs remain the same no matter how much the event size increases. AWS charges for requests based on the number of PUT, GET or LIST operations, not the size of the data being sent or retrieved. This means your request costs remain flat, but your storage costs balloon as your event size grows. If you stick with 4KB events, storage costs are tiny. However, as we saw in the 40KB and 400KB examples, larger events can increase your storage costs by 10x or even 100x. Here are the things we can learn from it:... Unlock this post for free, courtesy of Oskar Dudycz. |
Older messages
Using S3 but not the way you expected. S3 as strongly consistent event store.
Monday, September 2, 2024
The most powerful news usually comes surprisingly silent. AWS released a humble news: S3 now supports conditional writes. In the article I'll show you why is it groundbreaking and how powerful this
Webinar #21 - Michael Drogalis: Building the product on your own terms
Wednesday, August 28, 2024
Watch now | Did you have a brilliant idea for a startup but were afraid to try it? Or maybe you've built an Open Source tool but couldn't find a way to monetise it?How to be a solopreneur, a
Talk is cheap, show me the numbers! Benchmarking and beyond!
Monday, August 26, 2024
We're did a reality check to what we learned so far. I showed the real study from my projects performance analysis. We verified if connection pooling is indeed such important by giving the real
Architecture Weekly #191 - Is It Production Ready?
Tuesday, August 20, 2024
Why is connection pooling important? How can queuing, backpressure, and single-writer patterns help? In previous posts, we did a learning by-doing. We've built a simple connection pool, added
Architecture Weekly #190 - Queuing, Backpressure, Single Writer and other useful patterns for managing concurrency
Monday, August 12, 2024
Welcome to the new week! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
New Blogs on ThomasMaurer.ch for 11/05/2024
Tuesday, November 5, 2024
View this email in your browser Thomas Maurer Cloud & Datacenter Update This is the update for blog posts on ThomasMaurer.ch. Honored to Receive the YouTube Silver Creator Award By Thomas Maurer on
📱 I Tried Running Ubuntu on My Phone — Samsung's One UI Is How Android Should Be
Monday, November 4, 2024
Also: The Most Realistic Game Simulations, and More! How-To Geek Logo November 4, 2024 Did You Know Peter Weller, best known for his role as Robocop, is an accomplished academic and actor. He has a
Ranked | America’s Most Popular Drugs by Dollars Spent 💰
Monday, November 4, 2024
Tired of hearing about Ozempic? This chart isn't for you. It's one of America's most popular drugs in 2023. Here are some numbers. View Online | Subscribe | Download Our App Presented by:
Ranked | America’s Most Popular Drugs by Dollars Spent 💰
Monday, November 4, 2024
Tired of hearing about Ozempic? This chart isn't for you. It's one of America's most popular drugs in 2023. Here are some numbers. View Online | Subscribe | Download Our App Presented by:
Spyglass Dispatch 1: AI for Startups • RIP Quincy Jones • Days of Thunder 2 • Microsoft's Copilot Complaints • Apple's Shifting Vision Pro Strategy • A Game of Thrones Film • On 43
Monday, November 4, 2024
AI for Startups • RIP Quincy Jones • Days of Thunder 2 • Microsoft's Copilot Complaints • Apple's Shifting Vision Pro Strategy • A Game of Thrones Film • On 43 The Spyglass Dispatch is a free
Q3 Movers and Shakers
Monday, November 4, 2024
Top Tech Content sent at Noon! NODES 2024, a Dev Conference on AI, Knowledge Graphs & Apps Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today,
Learn more the future of access management with an IDC analyst
Monday, November 4, 2024
Join us on November 13th ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
GCP Newsletter #423
Monday, November 4, 2024
Welcome to issue #423 November 4th, 2024 News Compute Engine Official Blog C4A VMs now GA: Our first custom Arm-based Axion CPU - Google has announced the general availability of C4A virtual machines,
How this election will determine tech's future
Monday, November 4, 2024
Netscape lives on; Gen AI experiments; Best early phone deals -- ZDNET ZDNET Tech Today - US November 4, 2024 gettyimages-1995802253 How the 2024 US presidential election will determine tech's
⚙️ Disney AI
Monday, November 4, 2024
Plus: Deepfake fraud & the US election