🎙Fabio Buso about How Hopsworks Feature Store Became Fully Serverless
Was this email forwarded to you? Sign up here Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight and inspiration. Share this interview if you like it. No subscription is needed. 👤 Quick bio / Fabio BusoFabio Buso is VP of Engineering at Hopsworks, leading the Feature Store development team. Fabio holds a master’s degree in Cloud Computing and Services with a focus on data-intensive applications.
Fabio Buso (FB): I got started in machine learning the old fashioned way: with Andrew Ng’s ML course on Coursera. I’ve always been fascinated by the data side of ML applications. During my master's I had a minor in data-intensive applications. It was during my internship that I met the folks at Hopsworks. I started working with them during what was the early days of Hopsworks. Since then, I led several projects, from infrastructure all the way to leading the development of the Hopsworks Feature Store. Now I’m looking after the entire engineering team. Working closely with customers and users, I’m always impressed by the models they build and the challenges they solve. 🛠 ML Work
FB: In the first two versions of the Hopsworks feature store, it was Spark-centric. Most of the operations to create features and training datasets required data scientists’ direct interaction with Spark. Many data scientists have a soft spot for Pandas and can be very creative in avoiding using Spark, even when Spark is needed. Already in Hopsworks 2.x we started building support for pure Python clients. With Hopsworks 3.x this capability has reached a level of maturity that allows data scientists to not only write feature pipelines with Pandas, but also create training datasets, and perform batch/online inference from Python. The Python API improvements in Hopsworks 3.0 were part of the bigger theme of Hopsworks 3.0. In the new release, we focused on improving the data science experience. The new release also brings more mature support with model serving infrastructure built on KServe. We put particular focus on having a tight connection between the model serving infrastructure and the feature store. This allows data scientists to deploy models at scale on KServe and have easy access to the precomputed features that power those real-time predictions.
FB: The serverless architecture is also part of the developer experience work we did for Hopsworks 3.0. Hopsworks brings lots of cool functionality to the feature engineering pipelines (a UI to discover and collaborate with your team, lineage, tags, and more). Serverless is the answer to the question we posed ourselves: how can we make it easier for data scientists to start building features and leverage the functionalities Hopsworks provides, without the need to deploy and manage a Hopsworks platform? Today with Hopsworks Serverless, data scientists can be up and running with Hopsworks in a matter of seconds, without the need to connect any cloud account or install any software.
FB: The new API architecture has the same philosophy as frameworks like PyTorch/Tensorflow. In PyTorch/Tensorflow data scientists interact with the Python API, but the heavy lifting is done by optimized C++ routines running on different hardware, but users don’t have to interact with it. Something similar happens in Hopsworks. Users create and register features with Python and in the backend Hopsworks leverages Spark to persist those features in the feature groups and to join features together when creating a training dataset. However, users don’t have to have knowledge of Spark to use the feature store. For advanced use cases that require extremely fresh features or large feature engineering pipeline, users can still interact directly with Spark as it was the case in previous versions of Hopsworks.
FB: Istio is the go-to secure service mesh in K8s. KServe is the de facto framework to deploy models on K8s and make those models available to users through rest APIs. KServe leverages Istio to provide service discovery, request routing, and secure external endpoint functionalities. Both Istio and KServe are widely used and battle-tested, making them good candidates to run production-grade model serving infrastructure. What we did in Hopsworks is extend KServe with access control using Hopsworks API keys and feature/prediction logging, seamlessly integrating it with the Hopsworks ecosystem. Hopsworks’ model registry is designed for managing versioned KServe deployments (artifacts, transformers, predictors), and Hopsworks Python API gives you secure access to both the feature store and model deployments on KServe. But most importantly we worked on providing the models deployed on KServe with real-time data they need to make predictions using the feature store. At the same time, Hopsworks provides the infrastructure to log the predictions back to the feature store, enabling analysis and debugging of models, and even new feature data for models.
FB: From a technology perspective I expect all platforms to start focusing as well on data scientists and lowering the barrier to adoption. Like we did for Hopsworks 3.0. The market is already going through a consolidation phase, with several startups who offered feature stores last year no longer doing so this year. The remaining vendors are maturing their platforms and focusing on adding additional use cases. Like any other segment, I expect some players to not make it and I would not be surprised if some big player makes some moves to augment or bootstrap their ML/AI capabilities. 💥 Miscellaneous – a set of rapid-fire questions
All horses have the same color, which is an attempt to use induction proof to prove that any given group of horses are of the same color. Seems reasonable at first, but the induction breaks down if you only have 2 horses.
Feature Engineering Bookcamp by Sinan Ozdemir, Manning. In particular, the hands-on chapters that use Hopsworks!
Chatbots or even GPT-3 have become very good at fooling humans into thinking they are talking to other humans. However, they have been developed for this very purpose, and therefore it’s hard to derive intelligence from that.
Diffusion models are all the hype these days. It will be interesting to see how far the community will be able to push these models and, more importantly, if we’ll be able to make them cheaper to train. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
🧪🧪 Edge#221: What are Diffusion Models
Tuesday, August 30, 2022
+ Google's Imagen; +MindsEye
🤖➕🙎🏽AGI and Human Alignment
Sunday, August 28, 2022
Weekly news digest curated by the industry insiders
📌 Event: Data Validation for Enterprise ML using Great Expectations and Hopsworks Feature Store/ Aug 31
Friday, August 26, 2022
Defining expectation suites and reusing existing ones Have you ever worked really hard on training an awesome model just to have everything break in production because of a change in ETL logic in a
🐙 Edge#220: Dive into Meta AI’s Make-A-Scene, which pushes the boundaries of AI art synthesis
Thursday, August 25, 2022
The new model uses text-to-image and image-to-image generation to produce astonishing artistic outputs.
📝 Guest post: Your Fitbit for data and model health*
Wednesday, August 24, 2022
In this article, our partner WhyLabs describes the importance of monitoring data health and how they are helping organizations track vitals along ML and data pipelines to proactively detect data
You Might Also Like
Deduplication in Distributed Systems: Myths, Realities, and Practical Solutions
Monday, November 25, 2024
This week, we'll discuss the deduplication strategies. We'll see whether they're useful and consider scenarios where you may need them. We'll also do a reality check with the promises
How to know if your data has been exposed
Monday, November 25, 2024
How do you know if your personal data has been leaked? Imagine getting an instant notification if your SSN, credit card, or password has been exposed on the dark web — so you can take action
⚙️ Amazon and Anthropic
Monday, November 25, 2024
Plus: The hidden market of body-centric data
⚡ THN Recap: Top Cybersecurity Threats, Tools & Tips (Nov 18-24)
Monday, November 25, 2024
Don't miss the vital updates you need to stay secure. Read the full recap now. The Hacker News THN Recap: Top Cybersecurity Threats, Tools, and Practices (Nov 18 - Nov 24) We hear terms like “state
Researchers Uncover Malware Using BYOVD to Bypass Antivirus Protections
Monday, November 25, 2024
THN Daily Updates Newsletter cover Generative AI For Dummies ($18.00 Value) FREE for a Limited Time Generate a personal assistant with generative AI Download Now Sponsored LATEST NEWS Nov 25, 2024 THN
Post from Syncfusion Blogs on 11/25/2024
Monday, November 25, 2024
New blogs from Syncfusion Build World-Class Flutter Apps with Globalization and Localization By Lavanya Anaimuthu This blog explains the globalization and localization features supported in the
Is there more to your iPhone?
Monday, November 25, 2024
Have you ever wondered if there's more to your iPhone than meets the eye? Maybe you've been using it for years, but certain powerful features and settings remain hidden. That's why we'
🎉 Black Friday Early Access: 50% OFF
Monday, November 25, 2024
Black Friday discount is now live! Do you want to master Clean Architecture? Only this week, access the 50% Black Friday discount. Here's what's inside: 7+ hours of lessons .NET Aspire coming
Open Pull Request #59
Monday, November 25, 2024
LightRAG, anything-llm, llm, transformers.js and an Intro to monads for software devs ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Last chance to register: SecOps made smarter
Monday, November 25, 2024
Don't miss this opportunity to learn how gen AI can transform your security workflowsㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤ elastic | Search. Observe. Protect