The Five Important Trends in Data, and the One Megatrend Powering Them All

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here.

The Five Important Trends in Data, and the One Megatrend Powering Them All

Jul 30, 2020 05:00 pm

Yesterday, Dremio hosted the Subsurface Conference, the first conference on cloud data lakes. More than 5000 people registered, and more than 2500 attended. If one had doubts that cloud data lakes are a strategic area for many in the data ecosystem, those figures should quash them.

I delivered a presentation at the end of the day that I’ll share here. Entitled 5 Data Trends You Should Know, the presentation covers the major trends we observe in the data world. Here’s a quick narrative of the talk.

There is a mega-trend underpinning the changes in data design philosophy and tooling: the rise of the data engineer. Data engineers are the people who move, shape, and transform data from the source to the tools that extract insight. We believe data engineers are the change agents in a decade-long process that will revolutionize data.

Data systems used to be purchased by IT. But in the last 20 years, individual departments started to purchase their own data systems. Each team, using their data systems, develops their proprietary data products: analyses, dashboards, machine learning systems, even new product features.

Data systems rely on data from other teams. So all of these teams share data. And just like that, the company has built a data mesh: a network of producers and consumers of data who share data via standard APIs or open-source formats like Apache Arrow & Parquet. When the data is stored in the cloud, we call it a cloud data lake.

image

Data engineers stand on the shoulders of 70 years of software development experience and take many of the learnings from that discipline. One example is developing a data engineering lifecycle. This is our current understanding of a typical data engineering software development lifecycle.

There are six steps:

  1. Ingesting data from the systems that produce it and writing it into open formats in the cloud
  2. Planning the software to build
  3. Querying data using a compute engine which runs across the cloud data lake
  4. Modeling the data to ensure there is one centralized definition of every metric with an owner, a lineage, and a status
  5. Developing the data product which could be analyses, BI reports, machine learning models, production features
  6. Monitoring and testing the data to ensure data consistency & integrity over time

As the profession of data engineering matures, engineers need new tools to help them with each step in the process. The five trends that we are observing within the data world are the rise of those tools at each step. Here are those 5:

  1. New data pipelines that use modern computer languages to create reusable abstractions for data processing, to monitor data pipelines, and to visualize the flow of data, the DAG (directed acyclic graph). Innovators here are Dagster, Airflow, and Prefect.
  2. Compute engines query data in the cloud without having to move it. They leverage the separation of data and compute to accelerate queries, enable secure and compliant access, .and future proof the infrastructure to new advances in tools and use cases which you haven’t been built. Innovators are Dremio and Databricks.
  3. Data modeling curates a data catalog for all the metrics within a company. When metrics are modeled, they are defined once, accurately, and everyone uses that definition. Innovators are Transform Data and Looker (with LookML).
  4. Data products are analyses, experiments, reports, and machine learning models/products built on data. Innovators in this category include Preset, Streamlit, and Tecton among others.
  5. Data quality tools monitor data streams, identify anomalies, create testing harnesses to ensure data is always accurate. Data quality innovators include MonteCarlo, SodaData, Great Expectations, and Data Gravity.

All the tools need to be synthesized to achieve the vision of a modern data match, and data engineers will pioneer that change.


Read in browser »
share on Twitter Like The Five Important Trends in Data, and the One Megatrend Powering Them All on Facebook


 

Recent Articles:

How to Recruit a Marketing Team with Great Product Marketing and Demand Generation Abilities
The Best Economic History Books According to Readers
The Unforeseen Benefits of Online Events
An Economic History of the US in Five Stock Market Crashes
What I've Learned about Modern Monetary Theory
Copyright © 2020 *|Tomasz Tunguz|*, All rights reserved.
You signed up to receive Ex Post Facto blog posts by submitting your email on tomtunguz.com

Our mailing address is:
Redpoint Ventures
3000 Sand Hill Rd
Menlo Park, CA 94025

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.
 

Older messages

How to Recruit a Marketing Team with Great Product Marketing and Demand Generation Abilities

Monday, July 27, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. How to Recruit a Marketing Team with Great Product Marketing and Demand Generation Abilities Jul

The Best Economic History Books According to Readers

Thursday, July 23, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. The Best Economic History Books According to Readers Jul 20, 2020 05:00 pm Thanks to everyone who

The Unforeseen Benefits of Online Events

Monday, July 20, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. The Unforeseen Benefits of Online Events Jul 18, 2020 05:00 pm In this era, virtual events have

An Economic History of the US in Five Stock Market Crashes

Friday, July 17, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. An Economic History of the US in Five Stock Market Crashes Jul 16, 2020 05:00 pm I've been

What I've Learned about Modern Monetary Theory

Monday, July 13, 2020

If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. What I've Learned about Modern Monetary Theory Jul 12, 2020 05:00 pm There's a relatively

You Might Also Like

🗞 What's New: Tips for better user onboarding

Saturday, May 18, 2024

Also: Make your business blog stand out! ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

What’s 🔥 in Enterprise IT/VC #394

Saturday, May 18, 2024

Let's go - great to see developer first infrastructure startups like Vercel + Harness cross the $100M ARR mark - what's next? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

🐻 $630K ARR - Simple API to generate images

Saturday, May 18, 2024

+ New idea I'm researching right now ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

🚒 It’s time to pour water on this myth

Saturday, May 18, 2024

Ecommerce is hard. Dear , There's one thing important that you need to know. Some people think running an e-commerce business is a simple, quick way to make money. Many others feel the opposite is

🚀 Carnival Goes 100% Starlink

Friday, May 17, 2024

Plus $ASTS and AT&T six-year partnership, Q1 2024 SPAC earnings, $RDW's new VLEO spacecraft and more! The latest space investing news and updates. View this email in your browser The Space

AI Spending Patterns : It's Not What You Think

Friday, May 17, 2024

Tomasz Tunguz Venture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here.​ ​AI Spending Patterns : It's Not What You Think​ Ramp

7 best AI video tools for creators — tried and tested

Friday, May 17, 2024

Plus tips, news & Buffer updates for your social media journey ‌ ‌ ‌ Image Hey there 👋🏾 This week was a battle of the robots playing out in real time. Between GPT-4o and Google's Gemini — AI is

10words: Top picks from this week

Friday, May 17, 2024

Today's projects: Viaconvo • BillboardGrid.com • Refact AI • Flot.ai • Botrush • PromptPal • Documenso • insightIQ • SEOBox • AskVideo.ai • SecondBrain.fyi • refine 10words Discover new apps and

Women are leaving tech - we need it to stop

Friday, May 17, 2024

Plus: Solar energy startups to watch and latest from Europe's best-funded nuclear startup View in browser Sponsor Card - Flagship (44) Good morning there, A growing number of women have been bowing

Unexpected Downtime: Stress as Enhancement vs. Stress as Panic — The Bootstrapped Founder 321

Friday, May 17, 2024

It started as a normal day — but then things started going wrong. And my stress levels rose. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌