📝 Guest Post: Guide to Building an ML Platform*
Was this email forwarded to you? Sign up here In this guest post, Stephen Oladele, Developer Advocate and MLOps Technical Content Creator, together with neptune.ai, dive into the topic of building Machine Learning platforms and share some best practices followed by the industry. Make sure to continue reading! Machine learning (ML) platforms are increasingly seen as the solution to consolidating all the components of the ML model lifecycle, from experimentation to production. These platforms not only provide your team with the tools and infrastructure they need to build and operate models at scale but also apply standard engineering and MLOps principles to all use cases. However, there's a catch: understanding what makes a successful ML platform and building one is no easy task. With a plethora of tools, frameworks, practices, and technologies available, it can be overwhelming to know where to begin. This guide is designed to help you navigate through the process and understand the key factors that contribute to a successful machine learning platform. What is a machine learning platform?An ML platform standardizes the technology stack for your data team around best practices to reduce incidental complexities with machine learning and better enable capabilities for data science teams across projects and workflows. Why are you building an ML platform? We ask this during product demos, user and support calls, and on our MLOps Live podcast. Generally, people say they do MLOps to make the development and maintenance of production machine learning seamless and efficient. Machine learning operations (MLOps) should be easier with ML platforms at all stages of a machine learning project’s life cycle, from prototyping to production at scale, as the number of models in production grows from one or a few to tens, hundreds, or thousands that have a positive effect on the business. An ML platform should be designed to:
But how to do it? MLOps best practices, learnings, and considerations from ML platform expertsWe have distilled some of the best practices and learnings from ML platform teams into the following points. Embrace iteration on your ML platformSimilar to any other software system, creating your ML platform shouldn't be a one-off task. As your business needs, infrastructure, teams, and workflows evolve, you should keep making changes to your ML platform. Initially, you may not have a clear vision of what your ideal ML platform should look like. However, by building something that works and consistently improving it, you should be able to create a platform that supports your data scientists and provides business value. Isaac Vidas, ML Platform Lead at Shopify, shared at Ray Summit 2022 that Shopify’s ML Platform had to go through three different iterations: “Our ML platform has gone through three iterations in the past. The first iteration was built on an in-house PySpark solution. The second iteration was built as a wrapper around the Google AI Platform (Vertex AI), which ran as a managed service on Google Cloud. We reevaluated our machine learning platform last year based on various requirements gathered from our users and data scientists, as well as business goals. We decided to build the third iteration on top of open source tools and technologies around our platform goals, with a focus on scalability, fast iterations, and flexibility.” Take, for example, Airbnb. They have built and iterated on their ML platform up to three times over the course of the entire project. The platform should evolve as the number of use cases your team solves increases. Be transparent to your users about true infrastructure costsAnother good idea is to make sure that all of your data scientists can see the cost estimate for every job they run in their workspace. This could help them learn how to manage costs better and use resources efficiently. “We recently included cost estimations (in every user workspace). This means the user is very familiar with the amount of money it takes to run their jobs. We can also have an estimation for the maximum workspace age cost, because we know the amount of time the workspace will run…” — Isaac Vidas, ML Platform Lead at Shopify, in Ray Summit 2022 Documentation is important on and within your platformDocumentation is crucial for any software, including ML platforms. It should be intuitive and comprehensive to facilitate ease of use and adoption by your users. To ensure clarity, you can explicitly specify which parts of the platform are not yet perfected and make it easy for users to differentiate between errors due to their own workflows and those of the platform. Quick-start guides and easy-to-read how-tos can aid in the successful adoption of the platform. Within the platform, it should also be easy for users to document their workflows. For instance, adding a notes section to the interface for the experiment management component could benefit data scientists. Documentation should start from the architecture and design phases, which enables you to:
Tooling and standardization are keyStandardizing workflows and tools on your platform can increase team efficiency, enable the use of the same workflows for multiple projects, simplify the development and deployment of ML services, and improve collaboration. Learn more from Uber Engineering’s former senior software engineer, Achal Shah. Be tool agnosticBe tool-agnostic to facilitate faster adoption and cross-functional team collaboration. Integrating your platform with the organization's existing stack eliminates the need for users to learn entirely new tools to improve their productivity. Starting from scratch in this manner is bound to be a lost cause. Make your platform portableEnsure that your platform is portable across different infrastructures to avoid difficulty moving it to a new one if it initially runs on the organization's infrastructure layer. Most open-source, end-to-end platforms are portable, and you can use their solutions or design principles as a guide to build your own platform. What’s next?Best practices are just the tip of the iceberg and only a small part of the full Guide to Building an ML Platform that you can find on Neptune’s MLOps Blog. It’s a huge resource that talks about:
There are also a ton of links to additional resources, like articles, podcasts, whitepapers, and more. *This post was written by Stephen Oladele, Developer Advocate and MLOps Technical Content Creator, for neptune.ai. We thank neptune.ai for their ongoing support of TheSequence.You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
LLaMA is Meta AI's New LLM that Matchest GPT-3.5 Across Many Tasks Despite Being Quite Smaller
Thursday, March 23, 2023
The model is significatively smaller than GPT-3.5 but matches its performance on many important LLM benchmarks.
Edge 275: Understanding Vertical Federated Learning
Tuesday, March 21, 2023
Vertical federated learning, Google's research about using federated learning to optimize mobile keyword predictions and the Flower framework.
Results of the Survey: 📝 How is MLOps more than just tools?
Monday, March 20, 2023
Hi there! As some of you may recall from our previous posts, TheSequence recently conducted a survey titled “How is MLOps more than just tools?” In this survey, we asked ML engineers, data scientists,
Another Monster Generative AI Week
Sunday, March 19, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
📌 Webinar: See How Tecton Enables Data Teams to Shift Notebook Development Into Production
Friday, March 17, 2023
Join us at Tecton's webinar! Tecton recently rolled out version 0.6, which includes new capabilities to simplify and accelerate feature engineering at scale. The update introduces notebook-driven
You Might Also Like
🤳🏻 We Need More High-End Small Phones — Linux Terminal Setup Tips
Sunday, November 24, 2024
Also: Why I Switched From Google Maps to Apple Maps, and More! How-To Geek Logo November 24, 2024 Did You Know Medieval moats didn't just protect castles from invaders approaching over land, but
JSK Daily for Nov 24, 2024
Sunday, November 24, 2024
JSK Daily for Nov 24, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
OpenAI's turbulent early years - Sync #494
Sunday, November 24, 2024
Plus: Anthropic and xAI raise billions of dollars; can a fluffy robot replace a living pet; Chinese reasoning model DeepSeek R1; robot-dog runs full marathon; a $12000 surgery to change eye colour ͏ ͏
Daily Coding Problem: Problem #1618 [Easy]
Sunday, November 24, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Zillow. Let's define a "sevenish" number to be one which is either a power
PD#602 How Netflix Built Self-Healing System to Survive Concurrency Bug
Sunday, November 24, 2024
CPUs were dying, the bug was temporarily un-fixable, and they had no viable path forward
RD#602 What are React Portals?
Sunday, November 24, 2024
A powerful feature that allows rendering components outside their parent component's DOM hierarchy
C#533 What's new in C# 13
Sunday, November 24, 2024
Params collections support, a new Lock type and others
⚙️ Smaller but deeper: Writer’s secret weapon to better AI
Sunday, November 24, 2024
November 24, 2024 | Read Online Ian Krietzberg Good morning. I sat down recently with Waseem Alshikh, the co-founder and CTO of enterprise AI firm Writer. Writer recently made waves with the release of
Sunday Digest | Featuring 'How Often People Go to the Doctor, by Country' 📊
Sunday, November 24, 2024
Every visualization published this week, in one place. Nov 24, 2024 | View Online | Subscribe | VC+ | Download Our App Hello, welcome to your Sunday Digest. This week we visualized the GDP per capita
Android Weekly #650 🤖
Sunday, November 24, 2024
View in web browser 650 November 24th, 2024 Articles & Tutorials Sponsored Why your mobile releases are a black box “What's the status of the release?” Who knows. Uncover the unseen challenges