🔵⚪️Edge#136: Kili Technology and Its Automated Data-Centric Training Platform
This is an example of TheSequence Edge, a Premium newsletter that our subscribers receive every Tuesday and Thursday. On Thursdays, we do deep dives into one of the freshest research papers or technology frameworks that is worth your attention. It helps you become smarter about ML and AI. 💥 What’s New in AI: Kili Technology Presents an Automated Data-Centric Training PlatformData training is one of the fastest-growing segments of the machine learning (ML) market. Supervised models’ dependency in labeled datasets has made data training one of the key components of modern ML pipelines. As a result, there has been an explosion of data training platforms looking to automate the creation of labeled datasets as well as its integration in training pipelines. In such a crowded space, it's hard to determine signals from noise and identify platforms that have meaningful traction, ambitious roadmap, and strong financial backing to remain relevant in years to come. Today, we would like to discuss Kili Technology. This automated data-centric AI training platform has been flying under the radar but has steadily become one of the most important automated data training stacks on the market. While AI has focused mostly on models, the real-world experience of those who put models into production shows that, most of the time, data is more important. When a system isn't working well, many teams instinctively try to improve the code, whereas, for many practical applications, it's more effective to focus on improving the data because the training data is the new code. What is an Automated Data-Centric Training Platform?Data-centric is defined by Stanford Professor Andrew Ng as the practice of systematically engineering the data used to build AI systems. Kili is designed to serve this approach with two key features: - the ability to control the consistency of the data. It offers three key features: quality indicators such as consensus or honeypot to assess the consistency of the annotated data at all levels (project, asset, labeler, annotation), instructions to align annotators on a common definition of truths and a review workflow to disambiguate conflicts. - the ability to control the completeness of the data, i.e., build a training dataset iteratively as a collection of scenarios to be covered. On the one hand, it offers a search engine that allows fine-tuning of model quality parameters to identify data slices on which the model systematically underperforms and, on the other hand, annotation automation tools (e.g., interactive annotation) to quickly fill in the poorly covered data scenarios. Kili Technology is a platform for automating the creation of high-quality training datasets for image, video, documents, time series, and voice datasets. Providing simple and easy-to-use user experiences for different datasets has been one of the main limitations of data labeling platforms. Kili addresses this limitation head-on by enabling a set of intuitive user experiences to automate data annotation tasks in a collaborative fashion. Additionally, Kili simplifies the end-to-end training data management process while enforcing robust access control and security policies. The startup has meaningful customer adoption with companies such as IBM, SAP, OVH and Blue Prism and recently raised a $25 million Series A. Customer traction and funding are a great validation of a company, but the most important part is to evaluate the platform in its technical merits. Let's dive in. CapabilitiesA way to think about Kili Technology is as automation workflows for the two main stages of ML models: training and production. During the training phase, users create a project using Kili's web interface and upload their target datasets. Depending on the type of data they require to annotate (image, videos, text, audio), Kili will leverage user interfaces optimized for the specifics of that dataset. Using those interfaces, users can collaborate in the labeling process that culminates with the creation of the training dataset. As a training data management tool, Kili fits into the MLOps value chain between upstream data storage and downstream model training. Highly Versatile PlatformThe previous workflows are complemented with a series of intuitive user experiences and tools that streamline labeling tasks for different types of datasets, such as the following:
ArchitecturePowering the tools and workflows in the Kili Technology platform, there is an architecture that works in on-premise and cloud deployments. Here are a few interesting points about the Kili's architecture:
This modern architecture allows Kili Technology to adapt to different infrastructures. The platform can be provisioned in a SaaS model using Google Cloud as well as on hybrid or on-premise models. Kili PlaygroundOne of the most interesting components of the Kili Technology architecture is the Kili Playground. Released as an open-source stack, Kili Playground is a Python library that abstracts the interaction with Kili’s GraphQL API making it easier for data scientists and machine learning engineers to build data labeling processes using a few lines of code. ConclusionThree years ago, the annotation market was embryonic. Today, the first end-to-end data-centric AI platforms are starting to appear. They allow to radically simplify annotation, better control the quality of the training data, iterate faster in training cycles, and ultimately complete AI projects 2-10 times faster. A flexible, modern architecture supporting state-of-the-art data labeling tools and workflows for audio, image, video and text datasets, together with meaningful customer traction and investor backing, are some of the factors making Kili one of the most relevant companies in this new area of machine learning. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Key phrases
You Might Also Like
SRE Weekly Issue #422
Monday, April 29, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries,
Quick question
Sunday, April 28, 2024
I want to learn how I can better serve you
Kotlin Weekly #404 (NOT FOUND)
Sunday, April 28, 2024
ISSUE #404 28st of April 2024 Announcements Kotlin Multiplatform State of the Art Survey 2024 Help to shape and understand the Kotlin Multiplatform Ecosystem! It takes 4 minutes to fill this survey.
📲 Why Is It Called Bluetooth? — Check Out This AI Text to Song Generator
Sunday, April 28, 2024
Also: What to Know About Emulating Games on iPhone, and More! How-To Geek Logo April 28, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your
Daily Coding Problem: Problem #1425 [Easy]
Sunday, April 28, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Suppose an arithmetic expression is given as a binary tree. Each leaf is an
PD#571 Software Design Principles I Learned the Hard Way
Sunday, April 28, 2024
If there's two sources of truth, one is probably wrong. And yes, please repeat yourself.
When Procrastination is Productive & Ghost integrating with ActivityPub
Sunday, April 28, 2024
Automattic, Texts, and Beeper join forces to build world's best inbox, Reflect launches its iOS app, how to start small rituals, and a lot more in this week's issue of Creativerly. Creativerly
C#503 Building pipelines with System.Threading.Channels
Sunday, April 28, 2024
Concurrent programming challenges can be effectively addressed using channels
RD#453 Get your codebase ready for React 19
Sunday, April 28, 2024
Is your app ready for what's coming up in React 19's release
☁️ Azure Weekly #464 - 28th April 2024
Sunday, April 28, 2024
Azure Weekly Newsletter Issue #464 powered by endjin Welcome to issue 464 of the Azure Weekly Newsletter. In AI we have a good mix of high-level and deep-dive technical articles. Next-Gen Customer