🟧 Edge#192: Inside Predibase, the Enterprise Declarative ML Platform
Was this email forwarded to you? Sign up here On Thursdays, we do deep dives into one of the freshest research papers or technology frameworks that is worth your attention. Our goal is to keep you up to date with new developments in AI and introduce to you the platforms that deal with the ML challenges. 💥 Deep Dive: Inside Predibase, the enterprise declarative machine learning platformLow-code ML platforms have received a lot of attention in the past few years, but haven’t yet achieved widespread adoption. Predibase looks to deliver a high-performance, low-code approach to machine learning (ML) for individuals and organizations who have tried operationalizing ML but found themselves re-inventing the wheel each step of the way. Like infrastructure-as-code simplified IT, Predibase’s declarative approach allows users to focus on the “what” of their ML tasks while leaving its system to figure out the “how”. Where things go wrong todayBuilding ML solutions at organizations today is time-consuming and requires specialized expertise. After several months of development, the result is typically a bespoke solution that is handed over to other engineers, is often hard to maintain in the long term and creates technical debt. The founders of Predibase see this as the COBOL era of machine learning, and believe the field needs its “SQL moment”. This is a familiar pain for data science leaders, but many have been equally disenchanted by low-code/no-code automated machine learning solutions that haven’t scaled to the needs of their organization. Often, these tools are used for prototyping but fall short of being promoted to production. Furthermore, the tools that are built-for-scale (Spark, Airflow, Kubeflow) are not the same tools that are built-for experimentation. The path of least resistance in most data science teams becomes downloading some subset of the data to a local laptop and training a model using some amalgamation of Python libraries like Jupyter, Pandas, and PyTorch, and then throwing a model over the wall to an engineer tased with putting it in production. The solution is to strike the right abstraction for both ML modeling and infrastructure – one that provides an easy out-of-the-box experience while supporting increasingly complex use cases and allowing the users to iterate and improve their solutions. Declarative ML Systems: LEGO for Machine LearningThe basic idea behind declarative ML systems is to let users specify entire model pipelines as configurations and be intentional about the parts they care about while automating the rest. These configurations allow users to focus on the “what” rather than the “how” and have the potential to dramatically increase access and lower time-to-value. Declarative ML systems were pioneered by Ludwig at Uber and Overton at Apple. (check this interview about Ludwig and the importance of low-code ML we did with Piero Molino, creator of Ludwig, CEO of Predibase, last year). Ludwig served many different applications in production ranging from customer support automation, fraud detection and product recommendation while Overton processed billions of queries across multiple applications. Both frameworks made ML more accessible across stakeholders, especially engineers, and accelerated the pace of projects. Predibase is built on top of Ludwig, which allows users to define deep learning pipelines with a flexible and straightforward configuration system, suitable for a wide variety of tasks. Depending on the types of the data schema, users can compose and train state-of-the-art model pipelines on multiple modalities at once. Writing a configuration file for Ludwig is easy, and provides users with ML best practices out-of-the-box, without sacrificing control. Users can choose which part of the pipeline they want to swap new pieces in for, including choosing among state-of-the-art model architectures and training parameters, deciding how to preprocess data and running a hyperparameter search, all via simple config changes. This declarative approach increases the speed of development, makes it easy to improve model quality through rapid iteration, and makes it effortless to reproduce results without the need to write any complex code. One of Ludwig’s open-source users referred to composing these configurations as “LEGO for deep learning”. But as any ML team knows, training a deep learning model alone isn’t the only hard part – building the infrastructure to operationalize the model from data to deployment is often even more complex. That’s where Predibase comes in. Predibase – Bringing declarative ML to the enterprisePredibase brings the benefits of declarative ML systems to market with an enterprise-grade platform. There are three key things users do in Predibase:
Predibase's vision is to bring all the stakeholders of data & AI organizations together in one place, making collaboration seamless between data scientists working on models, data engineers working on deployments and product engineers using the models. The four pillars that were added on top of their open source foundations to make this reality are:
Predibase connects directly to your data sources, both structured data warehouses and unstructured data lakes. Any model trained in Predibase can be deployed to production with zero code changes and configured to automatically retrain as new data comes in because both experimentation and productionization go through the same unified declarative configuration.
Predibase features a cloud-native serverless infrastructure layer built on top of Horovod, Ray, and Kubernetes. It provides the ability to autoscale workloads across multi-node and multi-GPU systems in a way that is cost-effective and tailored to the model and dataset. This combines highly parallel data processing, distributed training, and hyperparameter optimization into a single workload, and supports both high throughput batch prediction as well as low-latency real-time prediction via REST.
The declarative abstraction that Predibase adopts makes it easy for users to modify model pipelines by editing their configuration. Defining models as configs allows Predibase to show differences between model versions over time in a concise way, making it easier to iterate and improve them. That also allows to introduce a unique alternative to AutoML: instead of running expensive experiments, Predibase suggests the best subsequent configurations to train depending on the explorations already conducted, creating a virtuous cycle of improvement.
With the rise of the modern data stack, the number of data professionals comfortable with SQL has also grown. So, alongside its Python SDK and UI, Predibase also introduces PQL – Predictive Query Language – as an interface that brings ML closer to the data. Using PQL, users can train models and run predictive queries through a SQL-like syntax they are already familiar with. ConclusionDeclarative machine learning systems have dramatically increased the velocity and lowered the barrier-to-entry for machine learning projects at leading tech companies, and now Predibase is bringing the approach to all organizations with its enterprise platform built on open-source foundations. Predibase is currently available by invitation only, you can request a demo here: https://predibase.com/request-early-access You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📝 Guest post: How to Measure Your GPU Cluster Utilization, and Why That Matters*
Wednesday, May 18, 2022
In this article, Run:AI's team introduces rntop, a new super useful open-source tool that measures GPU cluster utilization. Learn why that's a critical measure for data scientists, as well as
📨 Edge#191: MPI – the Fundamental Enabler of Distributed Training
Tuesday, May 17, 2022
In this issue: we discuss the fundamental enabler of distributed training: message passing interface (MPI); +Google's paper about General and Scalable Parallelization for ML Computation Graphs; +
📌Event: Join the Largest Conference on MLOps: 3rd Annual MLOps World 2022! 🎉
Monday, May 16, 2022
We are happy to support the 3rd Annual MLOps World 2022! The MLOps World Committee would like to invite you this June 9-10th for a truly must-attend event, and an unforgettable experience in Toronto,
Google’s Big ML Week
Sunday, May 15, 2022
Weekly news digest curated by the industry insiders
📌 Last chance! Join us at apply() – the ML Data Engineering Conference
Friday, May 13, 2022
It's free
You Might Also Like
Stripe makes more changes
Thursday, April 25, 2024
TikTok is in trouble, and net neutrality is back View this email online in your browser By Christine Hall Thursday, April 25, 2024 Welcome back to TechCrunch PM, your home for all things startups,
💎 Issue 414 - From a Lorry Driver to Ruby on Rails Developer at 38
Thursday, April 25, 2024
This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 414 Release Date Apr 25, 2024 Your weekly report of the most popular Ruby news, articles and
💻 Issue 414 - JavaScript Features That Most Developers Don’t Know
Thursday, April 25, 2024
This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 414 Release Date Apr 25, 2024 Your weekly report of the most popular Node.js news, articles and
💻 Issue 407 - The Performance Impact of C++'s `final` Keyword
Thursday, April 25, 2024
This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 407 Release Date Apr 25, 2024 Your weekly report of the most popular .NET news, articles and projects
💻 Issue 414 - Everyone Has JavaScript, Right?
Thursday, April 25, 2024
This week's Awesome JavaScript Weekly Read this email on the Web The Awesome JavaScript Weekly Issue » 414 Release Date Apr 25, 2024 Your weekly report of the most popular JavaScript news, articles
📱 Issue 408 - All web browsers on iOS are just Safari with different design
Thursday, April 25, 2024
This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 408 Release Date Apr 25, 2024 Your weekly report of the most popular iOS news, articles and projects Popular
💧 Don't Bother Liquid Cooling Your AMD CPU — Why You Should Keep Using Live Photos on iPhone
Thursday, April 25, 2024
Also: We review the Unistellar Odyssey iPhone Telescope, and More! How-To Geek Logo April 25, 2024 Did You Know Charles Darwin and Abraham Lincoln were both born on the same day: February 12, 1809. 💻
💻 Issue 332 - 🥇The first framework that lets you visualize your React/NodeJS app 🤯
Thursday, April 25, 2024
This week's Awesome React Weekly Read this email on the Web The Awesome React Weekly Issue » 332 Release Date Apr 25, 2024 Your weekly report of the most popular React news, articles and projects
💻 Issue 409 - Sized, DynSized, and Unsized by Niko Matsakis
Thursday, April 25, 2024
This week's Awesome Rust Weekly Read this email on the Web The Awesome Rust Weekly Issue » 409 Release Date Apr 25, 2024 Your weekly report of the most popular Rust news, articles and projects
📱 Issue 411 - AI Starts to Sift Through String Theory's Near-Endless Possibilities
Thursday, April 25, 2024
This week's Awesome Swift Weekly Read this email on the Web The Awesome Swift Weekly Issue » 411 Release Date Apr 25, 2024 Your weekly report of the most popular Swift news, articles and projects