📝 Guest post: Prevent AI failure with data logging and ML monitoring*
Was this email forwarded to you? Sign up here Monitoring and observability for AI applications are on every organization’s roadmap right now. In this guest post, our partner WhyLabs highlights the need for data and machine learning-specific logging. They describe whylogs, the open-source standard for data logging, which enables monitoring and more in AI and data applications. You can dive directly into whylogs with the getting started example or read on to learn more. First, what is data logging?Logs are an essential part of monitoring and observability for classic software applications. Logs enable you to diagnose what happened to an application, track changes over time, and debug issues that arise. When it comes to ML applications, however, it’s not enough to log the traditional software metrics such as uptime, bounce rate, and load time. Data behaves differently than code and necessitates collecting different signals from the application. That’s where data logging comes in. Data logs capture data quality, data drift, model performance, and other data-specific health signals. With data logs, you can monitor model performance and data drift, validate data quality, track data for ML experiments, and institute data auditing and governance best practices. But how can you implement data logs in practice? That’s where whylogs comes inThe team at WhyLabs built whylogs as the open-source standard for data logging to address all the logging needs of AI builders. With whylogs, the data flowing through AI and data applications gets continuously logged. With whylogs, you can generate statistical summaries, whylogs profiles, from data as it flows through your data pipelines and into your machine learning models. With these statistical summaries, you can track changes in the data and model over time, picking up on issues like data drift or data quality degradation. Who uses whylogs?Today, whylogs is used by thousands of AI builders, from startups to Fortune 100 companies. Users solve for a wide variety of use cases, data types and model types. We designed whylogs in collaboration with data scientists, data engineers, and machine learning engineers across the ML community. The mission of whylogs is simple: create a platform-agnostic library that captures all key statistical properties of a dataset. By design, whylogs works natively with both streaming and batch data. It works out of the box on tabular, image, and text data, and can be extended to handle arbitrary data types, such as embeddings, audio, video, etc. The library runs natively in Python or Java environments, so it can be used with Pandas, Dask, Modin, Ray, Apache Spark, or many other data storage and processing tools. In fact, with over 35 integrations available today and more in progress, it’s safe to say that whylogs can work with whatever tool stack you’re using. What can I use whylogs for?Setting up whylogs takes less time than brewing a cup of coffee. Simply
After generating a whylogs profile, you can:
These three functionalities unlock a variety of use cases for data scientists, data engineers, and machine learning engineers:
What’s new with whylogs v1?With the launch of whylogs v1, WhyLabs released a host of new features and functionalities, improving the library in four key development areas:
API SimplificationGenerating whylogs profiles takes a single line of code. Logging your data is as easy as running Data scientists and ML engineers will find it even easier to log data with the simplified API. This means higher data quality, more ML models reliably deployed in production, and more best practices followed. Profile ConstraintsWith profile constraints, you can save hours of extra work by preventing data bugs before they have an opportunity to propagate throughout the entire data pipeline. Simply define tests for your data and get alerted if data doesn’t look the way you expect it to. This enables data unit testing and data quality validation. Constraints might look like:
Better yet, setting up constraints is even easier with the To get started with constraints, check out this example notebook. Profile VisualizerIn addition to being able to automatically get notified about potential issues in data, it’s also useful to be able to inspect your data visually. With the profile visualizer, you can generate interactive reports about your profiles (either a single profile or comparing two profiles) directly in your Jupyter notebook environment. This enables exploratory data analysis, data drift detection, and data observability. The profile visualizer lets you create some useful visualizations of your data:
To learn more about the profile visualizer, check out this example notebook. Performance improvementsWith the latest performance improvements in v1, you can now profile 1M rows per second. Across all benchmarks, there was a more than 500x improvement in the ability of whylogs to profile large datasets compared to previous versions. Wow! The secret to this performance improvement is vectorization; the library now utilizes lightning-fast C performance for data summarization. The time it takes to profile a dataset grows sub-linearly, so profiling larger datasets takes less time per row than profiling smaller ones, while smaller datasets are still easily profiled in under a second. With such performance, whylogs will serve teams who work with data of any size, from a few million rows per week to billions of transactions per minute. ConclusionThe whylogs project is the open-source standard for data logging, enabling applications spanning from data quality validation to ML model monitoring. With whylogs v1, users get more value with the library than ever before. You can get started with the library or check out the whylogs GitHub to learn more. *This post was written by the WhyLabs team. We thank WhyLabs for their ongoing support of TheSequence.You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
💠 Edge#195: A New Series About Graph Neural Networks
Tuesday, May 31, 2022
In this issue: we start a new series about graph neural networks (GNN); we observe how DeepMind showcases the potential of GNN; we discuss Deep Graph Library, a framework for implementing GNNs. Enjoy
🟥🟩🟦🟨 Microsoft’s New ML Announcements
Sunday, May 29, 2022
Weekly news digest curated by the industry insiders
🎙 Mike Del Balso/CEO of Tecton about Operational ML and ML Flywheels
Friday, May 27, 2022
It's so inspiring to learn from practitioners and thinkers. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight
👁 Edge#194: Masterful AI, the Training Platform for Automated Computer Vision
Thursday, May 26, 2022
On Thursdays, we do deep dives into one of the freshest research papers or technology frameworks that is worth your attention. Our goal is to keep you up to date with new developments in AI and
🧠 Why subscribe to TheSequence?
Wednesday, May 25, 2022
#1 AI Newsletter on Substack
You Might Also Like
🔒 The Vault Newsletter: November issue 🔑
Monday, November 25, 2024
Get the latest business security news, updates, and advice from 1Password. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🧐 The Most Interesting Phones You Didn't See in 2024 — Making Reddit Faster on Older Devices
Monday, November 25, 2024
Also: Best Black Friday Deals So Far, and More! How-To Geek Logo November 25, 2024 Did You Know If you look closely over John Lennon's shoulder on the iconic cover of The Beatles Abbey Road album,
JSK Daily for Nov 25, 2024
Monday, November 25, 2024
JSK Daily for Nov 25, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
Ranked | How Americans Rate Business Figures 📊
Monday, November 25, 2024
This graphic visualizes the results of a YouGov survey that asks Americans for their opinions on various business figures. View Online | Subscribe Presented by: Non-consensus strategies that go where
Spyglass Dispatch: Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad
Monday, November 25, 2024
Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad The
Daily Coding Problem: Problem #1619 [Hard]
Monday, November 25, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given two non-empty binary trees s and t , check whether tree t has exactly the
Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow
Monday, November 25, 2024
Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the
What Investors Want From AI Startups in 2025
Monday, November 25, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon
GCP Newsletter #426
Monday, November 25, 2024
Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's
⏳ 36 Hours Left: Help Get "The Art of Data" Across the Finish Line 🏁
Monday, November 25, 2024
Visual Capitalist plans to unveal its secrets behind data storytelling, but only if the book hits its minimum funding goal. View Online | Subscribe | Download Our App We Need Your Help Only 36 Hours