The Five Important Trends in Data, and the One Megatrend Powering Them All

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here.

The Five Important Trends in Data, and the One Megatrend Powering Them All

Jul 30, 2020 05:00 pm

Yesterday, Dremio hosted the Subsurface Conference, the first conference on cloud data lakes. More than 5000 people registered, and more than 2500 attended. If one had doubts that cloud data lakes are a strategic area for many in the data ecosystem, those figures should quash them.

I delivered a presentation at the end of the day that I’ll share here. Entitled 5 Data Trends You Should Know, the presentation covers the major trends we observe in the data world. Here’s a quick narrative of the talk.

There is a mega-trend underpinning the changes in data design philosophy and tooling: the rise of the data engineer. Data engineers are the people who move, shape, and transform data from the source to the tools that extract insight. We believe data engineers are the change agents in a decade-long process that will revolutionize data.

Data systems used to be purchased by IT. But in the last 20 years, individual departments started to purchase their own data systems. Each team, using their data systems, develops their proprietary data products: analyses, dashboards, machine learning systems, even new product features.

Data systems rely on data from other teams. So all of these teams share data. And just like that, the company has built a data mesh: a network of producers and consumers of data who share data via standard APIs or open-source formats like Apache Arrow & Parquet. When the data is stored in the cloud, we call it a cloud data lake.

image

Data engineers stand on the shoulders of 70 years of software development experience and take many of the learnings from that discipline. One example is developing a data engineering lifecycle. This is our current understanding of a typical data engineering software development lifecycle.

There are six steps:

  1. Ingesting data from the systems that produce it and writing it into open formats in the cloud
  2. Planning the software to build
  3. Querying data using a compute engine which runs across the cloud data lake
  4. Modeling the data to ensure there is one centralized definition of every metric with an owner, a lineage, and a status
  5. Developing the data product which could be analyses, BI reports, machine learning models, production features
  6. Monitoring and testing the data to ensure data consistency & integrity over time

As the profession of data engineering matures, engineers need new tools to help them with each step in the process. The five trends that we are observing within the data world are the rise of those tools at each step. Here are those 5:

  1. New data pipelines that use modern computer languages to create reusable abstractions for data processing, to monitor data pipelines, and to visualize the flow of data, the DAG (directed acyclic graph). Innovators here are Dagster, Airflow, and Prefect.
  2. Compute engines query data in the cloud without having to move it. They leverage the separation of data and compute to accelerate queries, enable secure and compliant access, .and future proof the infrastructure to new advances in tools and use cases which you haven’t been built. Innovators are Dremio and Databricks.
  3. Data modeling curates a data catalog for all the metrics within a company. When metrics are modeled, they are defined once, accurately, and everyone uses that definition. Innovators are Transform Data and Looker (with LookML).
  4. Data products are analyses, experiments, reports, and machine learning models/products built on data. Innovators in this category include Preset, Streamlit, and Tecton among others.
  5. Data quality tools monitor data streams, identify anomalies, create testing harnesses to ensure data is always accurate. Data quality innovators include MonteCarlo, SodaData, Great Expectations, and Data Gravity.

All the tools need to be synthesized to achieve the vision of a modern data match, and data engineers will pioneer that change.


Read in browser »
share on Twitter Like The Five Important Trends in Data, and the One Megatrend Powering Them All on Facebook


 

Recent Articles:

How to Recruit a Marketing Team with Great Product Marketing and Demand Generation Abilities
The Best Economic History Books According to Readers
The Unforeseen Benefits of Online Events
An Economic History of the US in Five Stock Market Crashes
What I've Learned about Modern Monetary Theory
Copyright © 2020 *|Tomasz Tunguz|*, All rights reserved.
You signed up to receive Ex Post Facto blog posts by submitting your email on tomtunguz.com

Our mailing address is:
Redpoint Ventures
3000 Sand Hill Rd
Menlo Park, CA 94025

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.
 

Older messages

How to Recruit a Marketing Team with Great Product Marketing and Demand Generation Abilities

Monday, July 27, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. How to Recruit a Marketing Team with Great Product Marketing and Demand Generation Abilities Jul

The Best Economic History Books According to Readers

Thursday, July 23, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. The Best Economic History Books According to Readers Jul 20, 2020 05:00 pm Thanks to everyone who

The Unforeseen Benefits of Online Events

Monday, July 20, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. The Unforeseen Benefits of Online Events Jul 18, 2020 05:00 pm In this era, virtual events have

An Economic History of the US in Five Stock Market Crashes

Friday, July 17, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. An Economic History of the US in Five Stock Market Crashes Jul 16, 2020 05:00 pm I've been

What I've Learned about Modern Monetary Theory

Monday, July 13, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. What I've Learned about Modern Monetary Theory Jul 12, 2020 05:00 pm There's a relatively

You Might Also Like

🚀 Ready to scale? Apply now for the TinySeed SaaS Accelerator

Friday, February 14, 2025

What could $120K+ in funding do for your business? ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

📂 How to find a technical cofounder

Friday, February 14, 2025

​ ​ ​ ​ If you're a marketer looking to become a founder, this newsletter is for you. Starting a startup alone is hard. Very hard. Even as someone who learned to code, I still believe that the

AI Impact Curves

Friday, February 14, 2025

Tomasz Tunguz Venture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here.​ ​AI Impact Curves​ What is the impact of AI across different

15 Silicon Valley Startups Raised $302 Million - Week of February 10, 2025

Friday, February 14, 2025

💕 AI's Power Couple 💰 How Stablecoins Could Drive the Dollar 🚚 USPS Halts China Inbound Packages for 12 Hours 💲 No One Knows How to Price AI Tools 💰 Blackrock & G42 on Financing AI

The Rewrite and Hybrid Favoritism 🤫

Friday, February 14, 2025

Dogs, Yay. Humans, Nay͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌

🦄 AI product creation marketplace

Friday, February 14, 2025

Arcade is an AI-powered platform and marketplace that lets you design and create custom products, like jewelry. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Crazy week

Friday, February 14, 2025

Crazy week. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

join me: 6 trends shaping the AI landscape in 2025

Friday, February 14, 2025

this is tomorrow Hi there, Isabelle here, Senior Editor & Analyst at CB Insights. Tomorrow, I'll be breaking down the biggest shifts in AI – from the M&A surge to the deals fueling the

Six Startups to Watch

Friday, February 14, 2025

AI wrappers, DNA sequencing, fintech super-apps, and more. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

How Will AI-Native Games Work? Well, Now We Know.

Friday, February 14, 2025

A Deep Dive Into Simcluster ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏