🔵⚪️Edge#136: Kili Technology and Its Automated Data-Centric Training Platform
This is an example of TheSequence Edge, a Premium newsletter that our subscribers receive every Tuesday and Thursday. On Thursdays, we do deep dives into one of the freshest research papers or technology frameworks that is worth your attention. It helps you become smarter about ML and AI. 💥 What’s New in AI: Kili Technology Presents an Automated Data-Centric Training PlatformData training is one of the fastest-growing segments of the machine learning (ML) market. Supervised models’ dependency in labeled datasets has made data training one of the key components of modern ML pipelines. As a result, there has been an explosion of data training platforms looking to automate the creation of labeled datasets as well as its integration in training pipelines. In such a crowded space, it's hard to determine signals from noise and identify platforms that have meaningful traction, ambitious roadmap, and strong financial backing to remain relevant in years to come. Today, we would like to discuss Kili Technology. This automated data-centric AI training platform has been flying under the radar but has steadily become one of the most important automated data training stacks on the market. While AI has focused mostly on models, the real-world experience of those who put models into production shows that, most of the time, data is more important. When a system isn't working well, many teams instinctively try to improve the code, whereas, for many practical applications, it's more effective to focus on improving the data because the training data is the new code. What is an Automated Data-Centric Training Platform?Data-centric is defined by Stanford Professor Andrew Ng as the practice of systematically engineering the data used to build AI systems. Kili is designed to serve this approach with two key features: - the ability to control the consistency of the data. It offers three key features: quality indicators such as consensus or honeypot to assess the consistency of the annotated data at all levels (project, asset, labeler, annotation), instructions to align annotators on a common definition of truths and a review workflow to disambiguate conflicts. - the ability to control the completeness of the data, i.e., build a training dataset iteratively as a collection of scenarios to be covered. On the one hand, it offers a search engine that allows fine-tuning of model quality parameters to identify data slices on which the model systematically underperforms and, on the other hand, annotation automation tools (e.g., interactive annotation) to quickly fill in the poorly covered data scenarios. Kili Technology is a platform for automating the creation of high-quality training datasets for image, video, documents, time series, and voice datasets. Providing simple and easy-to-use user experiences for different datasets has been one of the main limitations of data labeling platforms. Kili addresses this limitation head-on by enabling a set of intuitive user experiences to automate data annotation tasks in a collaborative fashion. Additionally, Kili simplifies the end-to-end training data management process while enforcing robust access control and security policies. The startup has meaningful customer adoption with companies such as IBM, SAP, OVH and Blue Prism and recently raised a $25 million Series A. Customer traction and funding are a great validation of a company, but the most important part is to evaluate the platform in its technical merits. Let's dive in. CapabilitiesA way to think about Kili Technology is as automation workflows for the two main stages of ML models: training and production. During the training phase, users create a project using Kili's web interface and upload their target datasets. Depending on the type of data they require to annotate (image, videos, text, audio), Kili will leverage user interfaces optimized for the specifics of that dataset. Using those interfaces, users can collaborate in the labeling process that culminates with the creation of the training dataset. As a training data management tool, Kili fits into the MLOps value chain between upstream data storage and downstream model training. Highly Versatile PlatformThe previous workflows are complemented with a series of intuitive user experiences and tools that streamline labeling tasks for different types of datasets, such as the following:
ArchitecturePowering the tools and workflows in the Kili Technology platform, there is an architecture that works in on-premise and cloud deployments. Here are a few interesting points about the Kili's architecture:
This modern architecture allows Kili Technology to adapt to different infrastructures. The platform can be provisioned in a SaaS model using Google Cloud as well as on hybrid or on-premise models. Kili PlaygroundOne of the most interesting components of the Kili Technology architecture is the Kili Playground. Released as an open-source stack, Kili Playground is a Python library that abstracts the interaction with Kili’s GraphQL API making it easier for data scientists and machine learning engineers to build data labeling processes using a few lines of code. ConclusionThree years ago, the annotation market was embryonic. Today, the first end-to-end data-centric AI platforms are starting to appear. They allow to radically simplify annotation, better control the quality of the training data, iterate faster in training cycles, and ultimately complete AI projects 2-10 times faster. A flexible, modern architecture supporting state-of-the-art data labeling tools and workflows for audio, image, video and text datasets, together with meaningful customer traction and investor backing, are some of the factors making Kili one of the most relevant companies in this new area of machine learning. |
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your