📝 Guest post: 5 Principles You Need To Know About Continuous ML Data Intelligence*
Was this email forwarded to you? Sign up here In this article, founder and CEO of Galileo Vikram Chatterji discusses the problems with ML data blindspots and introduces ML Data Intelligence that helps an ML team holistically understand and improve the health of the data powering ML across the organization. As a former product leader at Google AI, my team and I were responsible for building models that would ‘just work’. They needed to ‘just work’ because we were selling to some highly regulated industries like financial services and healthcare, where the price to pay for poor or biased predictions is very steep. Over and over again, we would think our model ‘worked’ due to high values on vanity metrics such as F1 or confidence scores, but within days we would realize issues with our data – it didn’t matter what other shiny tools we used for training, deploying or monitoring models – if the data was erroneous, the model would suffer, and the data can be ‘erroneous’ in dozens of ways, which made this a hard problem. Turned out that this problem was not unique to Google – over the past year, we realized after speaking with 100s of ML leaders, that analyzing and fixing the data across the ML workflow, or continuous ML Data Intelligence is their top problem. What tools did we use at Google, and these 100s of ML teams use for ML data intelligence? Sheets and scripts are still state-of-the-art! This has many problems.
What is ML data intelligence? The 5 Principles.ML data intelligence is a team’s ability to holistically understand and improve the health of the data powering ML across the organization. This removes data biases and production mishaps proactively thereby resulting in 100s of hours saved for data scientists, lowering costs dramatically and improving model predictions quickly, sometimes in the order of 10-12% or more. ML data intelligence tools are embedded in the model training and production environments to quickly identify data errors leveraging data-centric AI techniques baked in, and systematically enable data fixing with actionability and collaboration as key cornerstones. ML data intelligence is one of the first tools that companies need when embarking on the ML journey, even before labeling or figuring out which model to use – getting an understanding of the data health first and fixing/improving it sets a good foundation for smarter data sampling for annotation (thereby saving on labeling costs). The five pillars of ML data intelligence are:
ML data intelligence vs Data QualityThe quality of the data relies on being able to identify noise/errors fast – this could be within the data dump you get from a customer, or from the data the model is getting hit within production. ‘Data quality’ is abstract but critical. It needs constant supervision, analysis and adaptation of the data to ensure it is up and to the right. Data quality is a byproduct of ML data intelligence, which provides a framework to inspect, analyze and fix the data to ensure high data quality across the ML workflow. ML data intelligence vs ML MonitoringWhen we think of ‘ML monitoring’ there is a bias that conjures tools such as Datadog where incredible dashboards are constantly monitoring and alerting ML teams of model downtimes in production. This has two problems:
Moreover, while ML monitoring tools focus on the ML Engineer/Program Manager, ML data intelligence tools focus squarely on the data scientist as an assistant for continuous data analysis and fixing. The future of ML data intelligenceML data intelligence is a rapidly maturing but still evolving space. Most job functions over time, as they grow in prominence within an organization, become more data-driven in their decision-making. This has always required a new set of tools to step up and enable the shift.
Similarly, ML teams have become a mainstay for organizations, and now deserve the tools to quickly inspect, fix and track the data they are working with. This ‘data stack’ in the ML developers toolkit will be powered by innovations in data-centric AI research (academia has a growing focus here), as well as a growing understanding that fixing the data can lead to huge gains in model performance – but to ‘fix’, you need to first ‘understand’ – ML data intelligence will enable both for the data scientist, ushering in the data-driven ML mindset. To learn more, reach out to me, Vikram Chatterji. I will be happy to discuss how we solve the challenges of ML data Intelligence with Galileo. *This post was written by Vikram Chatterji, founder and CEO of Galileo. We thank Galileo for their support of TheSequence.You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
🌐 🕸 Graph Neural Networks Recap
Tuesday, July 12, 2022
Last week we finished our mini-series about Graph Neural Networks, an important one. Here is a full recap for you to catch up with the topics we covered. As the proverb (and many ML people) says:
⚡️ Flash 50% OFF
Monday, July 11, 2022
Only 7 days left!
🗣🗣🗣 No Language Left Behind
Sunday, July 10, 2022
Natural language understanding (NLU) is the area of deep learning that has seen the most impressive breakthroughs in recent years
📌 Free 7-Day Trial of FeatureBase, the Real-Time Database for Continuously Changing Data
Friday, July 8, 2022
We're excited to support Molecula's launch of FeatureBase and offer you a 7-day Trial. You can either enroll in a Cloud trial (without installation or infrastructure management) or install
🟩⬛️ Edge#206: OpenAI’s New Transformer Model Mastered Minecraft by Using Unlabeled Videos
Thursday, July 7, 2022
One of the first applications of transformer models to video intelligence
You Might Also Like
GCP Newsletter #396
Monday, April 29, 2024
Welcome to issue #396 April 29th, 2024 News Networking Official Blog Partners Introducing the Verified Peering Provider program, a simple alternative to Direct Peering - Google has launched a new
How many Vision Pro headsets has Apple sold?
Monday, April 29, 2024
The Morning After It's Monday, April 29, 2024. Apple Vision Pro headset production is reportedly being cut, sales are reportedly “way down.” But but but wait: Wasn't the Vision Pro meant to
Okta Warns of Unprecedented Surge in Proxy-Driven Credential Stuffing Attacks
Monday, April 29, 2024
THN Daily Updates Newsletter cover Webinar -- Uncovering Contemporary DDoS Attack Tactics -- and How to Fight Back Stop DDoS Attacks Before They Stop Your Business... and Make You Headline News.
Import AI 370: 213 AI safety challenges; everything becomes a game; Tesla's big cluster
Monday, April 29, 2024
Are AI systems more like religious artifacts or disposable entertainment? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple renews OpenAI talks 🧠, Google fires Python team 👨💻, React 19 beta ⚛️
Monday, April 29, 2024
Apple has renewed discussions with OpenAI to use its generative AI technology to power new features coming to the iPhone Sign Up |Advertise|View Online TLDR Together With QA Wolf TLDR 2024-04-29 😘 Kiss
Architecture Weekly #177 - 29nd April 2024
Monday, April 29, 2024
How do you make predictions about tech without the magical crystal ball? We did that today by example. We analysed what Redis and Terraform license changes relate to the new Typescript framework Effect
Software Testing Weekly - Issue 217
Monday, April 29, 2024
How do you deal with conflicts in QA? ⚔️ View on the Web Archives ISSUE 217 April 29th 2024 COMMENT Welcome to the 217th issue! How do you deal with conflicts in QA? Ideally, you'd like to know how
📧 Did you watch the free MMA chapters? (1+ hours of content)
Monday, April 29, 2024
Did you watch the free MMA chapters? Hey there! 👋 I wish you a fantastic start to the week. Last week, I launched Modular Monolith Architecture. More than 300+ students are already deep into the MMA
WP Weekly 191 - Essentials - Duplicate in Core, White Label Kadence, Studio for Mac
Monday, April 29, 2024
Read on Website WP Weekly 191 / Essentials It seems many essential features are being covered in-house, be it the upcoming duplicate posts/pages feature in the WordPress core or the launch of Studio
SRE Weekly Issue #422
Monday, April 29, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries,