[Last Week in AWS Extras]: An AWS Database Safari

 

In today's newsletter, I take you on a journey through the various database offerings AWS has, and give my thoughts about each of them. There are a lot! You may complain that I didn't hit your favorite, because anything is a database if you hold it wrong. That's okay; I'm sure some unscrupulous database vendor will be along shortly to dry your tears.

 

As always, should you wish to link someone else to this post, you can forward it, or else view it on the web.

 

 

Have you heard about ChaosSearch, the fully managed log analytics platform that leverages your Amazon S3 as a data store? According to the CTO at Armor, a global cybersecurity company with more than 1,000 customers in 42 countries, “ChaosSearch is a critical piece of our infrastructure for processing tens of terabytes per day of our customers’ log data.” And at Hubspot, the Engineering Lead said “We are able to process and analyze 10's of terabytes a day of Cloudflare log data to identify and fend off DDoS attacks on behalf of our customers at a fraction of the cost of our previous self-hosted ELK Stack.” So take it from me, or take it from them - either way, take a look at ChaosSearch today!. Sponsored

 

 

An AWS Database Safari

When we talk about vendor lock-in, one of the most common stories we see is one of databases. The database you pick to hold your data is something you're going to be using for a good long while; migrations are painful, expensive, time-consuming, and—in some cases —barely possible.

 

Amazon themselves ran on top of Oracle for a long time. Despite extreme incentive to get off of it as fast as possible, it took them years and inventing their own database engine to do it successfully.

 

If you take a look at AWS's offerings for databases, you’ll see ... a lot of options. Let's ignore their RDS databases, of which there are many. (There are five database engines to choose from, and their Aurora variants speak two of those engines’ languages fluently. Then we add in two more for Aurora's Serverless options, bringing this thing we're handwaving away to nine database options already.) Why does AWS have so many database options?

 

The simple answer is that different databases support different use cases. After all, "every AWS product is for somebody, and no AWS product is for everybody.”

 

Picking a database is a "one-way door," as Amazonians like to call it. It's painful and annoying to migrate databases even between different versions of the same engine. When you pick a database, you're making a commitment—whether you know it or not.

 

Let's start simple with...

Amazon Redshift

Redshift is an opinionated version of PostgreSQL that's designed for data warehouse projects. To wit, it's imagined that this is used for relational workloads that are likely to hit petabyte scale. Don't let the pricing fool you; you're not going to run just one of these suckers.

Amazon Athena

More cost-effective is Athena. This uses a variant of the Presto engine for running SQL queries against data that lives in S3. It's way, way, way cheaper to store data inside of S3 then it is to shove it onto disks attached to relational data stores, and this even works well for ad-hoc querying, too.

 

The challenge is that the query performance and latency for response is nondeterministic, meaning you may not want to have this hooked up to anything interactive in a web form. As Redshift begins to speak to S3 more effectively, the lines between the previous two database options blur.

SimpleDB

You might think SimpleDB has been discontinued. You'd be wrong. It's not in the AWS console because it never has been. Until a couple of weeks ago, there was an AWS job posting for the SimpleDB team in Chennai. It's still there—and AWS gives every appearance that it's not going anywhere. That said, unless you're already using it, you probably don't want to start. Instead, the best guidance is to instead consider...

DynamoDB

DynamoDB is an interesting take on a NoSQL database. If you know what your queries are going to look like in advance, it's hard to beat. It offers a key-value store but can also masquerade as a document store (more about those in a bit).

 

It's inexpensive when configured properly, its responsiveness is impressive, and you have no compute infrastructure to manage. But you do need to make your peace with the unfortunate fact that since it is proprietary, anything you're using in DynamoDB is unlikely to work anywhere else. A migration off of AWS therefore means you're also rearchitecting your DynamoDB data stores—and the applications that interface with them.

Amazon Neptune

Neptune is a managed graph database, which is an accurate answer that tells you absolutely nothing. Graph databases are great at returning results that highlight relationships. "This user is friends with the following users" is the canonical example, because basically anything else requires 80 pages to explain. My position is, and steadfastly remains, this: "If you need a graph database, you almost certainly know it, and Neptune is on the table; otherwise, move along."

Amazon QLDB

QLDB, or Quantum Ledger Database, is a database engine that arose from the question "What if we needed a blockchain but without all of the hype-driven nonsense that makes blockchains ridiculous for most use cases?"

 

In other words, if you need a ledger-style database but can trust a central authority, QLDB is for you. If you can't trust a central authority, Amazon Managed Blockchain might be a better answer. But let's face it: At that point, you're almost certainly past trusting a cloud provider, aren't you?

Amazon Timestream

While we're delving into the realm of fantasy, let's look at Timestream, their take on a time series database. This isn't an objectively nutty thing to want; my criticism of it comes from the fact that it was announced at re:Invent (AWS’s own version of Cloud Next) 2018, and over a year-and-a-half later, it has yet to enter either public preview or general availability. Time series databases (like InfluxDB) are great at displaying data over time. Metrics, logs, events—and frequently at incredibly high volume. This is a big deal not just in application monitoring but also in the world of IoT. Devices in the field reporting vast quantities of data back to a central point will often look for something in the time series space.

Amazon ElastiCache

ElastiCache has two variants (Redis and Memcached), both of which serve as an in-memory database. This means incredibly quick response times are available, since the query never has to touch the disk. This is thus generally used for keeping session data around.

 

A very common use case is having multiple web servers behind a load balancer, but sharing the session data so that users don't have to log in again every time the load balancer gives them to a different server. The risk, of course, is that since the data isn't persisted to disk, you're one power outage away from data loss. Speaking of data loss...

Amazon DocumentDB (with MongoDB compatibility)

MongoDB is a storied database that achieves some great things. Unfortunately, it also likes to emphasize performance at the potential cost of data integrity and historically has done so by burying some very important caveats deep within their documentation.

 

That said, what makes DocumentDB interesting to me is that AWS does no real marketing of the service past talking about how compatible with MongoDB 3.6 it is. My takeaway from that message is this: "If you want to run MongoDB in an AWS environment, consider this." Based upon MongoDB's community interactions, I don't want to run it at all, so I don't spend much time paying attention to DocumentDB, either. If you're less judgmental of MongoDB than I am, this is worth a gander.

Amazon Keyspaces (for Apache Cassandra)

Similarly, the AWS documentation for Keyspaces falls far short. It spends most of its energy talking about how compatible with Cassandra it is—albeit in ways that sound suspiciously like DynamoDB.

 

It's worth noting that one of the Dynamo paper authors went on to build Cassandra later in time. If I wanted to use a managed service but still have a theoretical database exodus strategy I could fall back to, I'd consider this as my first stop on the path.

Amazon Route 53

Lastly, we come to my favorite database: Route 53.

 

You might argue that DNS isn't a database; I would argue that an eventually consistent world-spanning key-value store that (in this case) offers a 100% uptime SLA is awfully hard to see as anything other than a database. I will accept that I'm in the minority opinion (for now!), but I will highlight that an awful lot of what people are misusing S3 for could just as easily be done well with Route 53.

 

This concludes my survey through AWS's database offerings. Unfortunately, this will rapidly go out of date; there are multiple job postings on AWS's site looking for people to work on "unreleased database products," so it's pretty clear that I may have to revisit this topic after re:Invent (AWS’s own version of Cloud Next) unless I want to watch it age badly.

 

 

Definitive Guide to AWS EKS Security - Download eBook

When using Amazon’s Elastic Kubernetes Service (EKS), you must understand which pieces of the security management role fall on you. Use this 42-page eBook from StackRox to learn about EKS cluster security, including the standard controls and best practices for minimizing the risk around cluster workloads, as well as specific requirements for securing an EKS cluster and its associated infrastructure. Sponsored

 
 
 
Corey

I’m Corey Quinn

I help companies address their horrifying AWS bills by both reducing the dollars spent and helping them understanding what they’re paying for.

 
 
The Cloud

Screaming in the Cloud & AWS Morning Brief

In addition to this newsletter, I host two podcasts: Screaming in the Cloud, about the business of cloud computing, featuring me talking to folks who are good at things; and AWS Morning Brief, a show about exclusively AWS with my snark at full-tilt.

 
 
The Cloud

Sponsor an Issue

Reach over 19,000 discerning engineers, managers, and enthusiasts who actually care about the state of Amazon's cloud ecosystems.

 



Want to skip these Last Week in AWS Extras? .

To make sure you keep getting these emails, please add corey@lastweekinaws.com to your address book or otherwise mark me as a permitted sender.

Want out of the loop completely? to tell me to leave you alone.

 

Duckbill Group

1728 Ocean Ave #307, San Francisco, CA 94112

 
                                                           

Older messages

[Last Week in AWS] Issue #165: AWS Graviton2 Clock Speeds Broadly Non-Competitive

Monday, June 15, 2020

Good Morning! Welcome to issue 166 of Last Week in AWS. Last week it came out that AWS is suing their former VP of Product Marketing under its incredibly-broad non-compete agreement. My thoughts on non

[Last Week in AWS Extras]: AWS Ruins Own Attempt at Sabotage

Wednesday, June 10, 2020

One of these weeks I'll get to cover the kind of cloud analysis I really enjoy, but last week we had racial justice issues that clearly took precedence, and this week Amazon apparently can't

[Last Week in AWS] Issue #165: Enduring the Cloud Migration Factory

Monday, June 8, 2020

Good Morning! Since my last email, folks have collectively donated tens of thousands of dollars to support racial justice. You really are the best audience in the world. Obviously this is only a start.

[Last Week in AWS Extras]: Snarking For Racial Justice

Wednesday, June 3, 2020

Snarking for Racial Justice This was originally going to be a wonderfully crafted treatise on why multi-cloud is nonsense. But if we look outside in the United States, or in the headlines of any paper

[Last Week in AWS] Issue #164: AWS Security Landscapers

Monday, June 1, 2020

Good Morning! This is the 164th issue of Last Week in AWS, but that feels like a hollow observation against the backdrop of the uprising we have seen developing in the United States (this

You Might Also Like

Press, Pause 🗜️

Friday, May 10, 2024

Apple doesn't understand why people like hydraulic presses. Here's a version for your browser. Hunting for the end of the long tail • May 09, 2024 Press, Pause Beyond misunderstanding its iPad

Data Science Weekly - Issue 546

Friday, May 10, 2024

Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Dell’s data breach

Thursday, May 9, 2024

Plus: Mistral AI is raising funds and Bumble's new strategy View this email online in your browser By Christine Hall Thursday, May 9, 2024 Welcome back to TechCrunch PM. Today I have for you a

💻 Issue 416 - The new disposable APIs in Javascript

Thursday, May 9, 2024

This week's Awesome JavaScript Weekly Read this email on the Web The Awesome JavaScript Weekly Issue » 416 Release Date May 09, 2024 Your weekly report of the most popular JavaScript news, articles

💻 Issue 409 - Making a 3D modeler in C in a week

Thursday, May 9, 2024

This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 409 Release Date May 09, 2024 Your weekly report of the most popular .NET news, articles and projects

📱 Issue 410 - FDA recalls defective iOS app that injured over 200 insulin pump users

Thursday, May 9, 2024

This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 410 Release Date May 09, 2024 Your weekly report of the most popular iOS news, articles and projects Popular

💎 Issue 416 - Ruby typing 2024: RBS, Steep, RBS Collections, subjective feelings

Thursday, May 9, 2024

This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 416 Release Date May 09, 2024 Your weekly report of the most popular Ruby news, articles and

💻 Issue 416 - Part 5: Building a Simple Web Server with Node.js

Thursday, May 9, 2024

This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 416 Release Date May 09, 2024 Your weekly report of the most popular Node.js news, articles and

💻 Issue 334 - Why React Query?

Thursday, May 9, 2024

This week's Awesome React Weekly Read this email on the Web The Awesome React Weekly Issue » 334 Release Date May 09, 2024 Your weekly report of the most popular React news, articles and projects

📱 Issue 413 - Swift’s native Clocks are very inefficient

Thursday, May 9, 2024

This week's Awesome Swift Weekly Read this email on the Web The Awesome Swift Weekly Issue » 413 Release Date May 09, 2024 Your weekly report of the most popular Swift news, articles and projects