🔵⚪️Edge#136: Kili Technology and Its Automated Data-Centric Training Platform
This is an example of TheSequence Edge, a Premium newsletter that our subscribers receive every Tuesday and Thursday. On Thursdays, we do deep dives into one of the freshest research papers or technology frameworks that is worth your attention. It helps you become smarter about ML and AI. 💥 What’s New in AI: Kili Technology Presents an Automated Data-Centric Training PlatformData training is one of the fastest-growing segments of the machine learning (ML) market. Supervised models’ dependency in labeled datasets has made data training one of the key components of modern ML pipelines. As a result, there has been an explosion of data training platforms looking to automate the creation of labeled datasets as well as its integration in training pipelines. In such a crowded space, it's hard to determine signals from noise and identify platforms that have meaningful traction, ambitious roadmap, and strong financial backing to remain relevant in years to come. Today, we would like to discuss Kili Technology. This automated data-centric AI training platform has been flying under the radar but has steadily become one of the most important automated data training stacks on the market. While AI has focused mostly on models, the real-world experience of those who put models into production shows that, most of the time, data is more important. When a system isn't working well, many teams instinctively try to improve the code, whereas, for many practical applications, it's more effective to focus on improving the data because the training data is the new code. What is an Automated Data-Centric Training Platform?Data-centric is defined by Stanford Professor Andrew Ng as the practice of systematically engineering the data used to build AI systems. Kili is designed to serve this approach with two key features: - the ability to control the consistency of the data. It offers three key features: quality indicators such as consensus or honeypot to assess the consistency of the annotated data at all levels (project, asset, labeler, annotation), instructions to align annotators on a common definition of truths and a review workflow to disambiguate conflicts. - the ability to control the completeness of the data, i.e., build a training dataset iteratively as a collection of scenarios to be covered. On the one hand, it offers a search engine that allows fine-tuning of model quality parameters to identify data slices on which the model systematically underperforms and, on the other hand, annotation automation tools (e.g., interactive annotation) to quickly fill in the poorly covered data scenarios. Kili Technology is a platform for automating the creation of high-quality training datasets for image, video, documents, time series, and voice datasets. Providing simple and easy-to-use user experiences for different datasets has been one of the main limitations of data labeling platforms. Kili addresses this limitation head-on by enabling a set of intuitive user experiences to automate data annotation tasks in a collaborative fashion. Additionally, Kili simplifies the end-to-end training data management process while enforcing robust access control and security policies. The startup has meaningful customer adoption with companies such as IBM, SAP, OVH and Blue Prism and recently raised a $25 million Series A. Customer traction and funding are a great validation of a company, but the most important part is to evaluate the platform in its technical merits. Let's dive in. CapabilitiesA way to think about Kili Technology is as automation workflows for the two main stages of ML models: training and production. During the training phase, users create a project using Kili's web interface and upload their target datasets. Depending on the type of data they require to annotate (image, videos, text, audio), Kili will leverage user interfaces optimized for the specifics of that dataset. Using those interfaces, users can collaborate in the labeling process that culminates with the creation of the training dataset. As a training data management tool, Kili fits into the MLOps value chain between upstream data storage and downstream model training. Highly Versatile PlatformThe previous workflows are complemented with a series of intuitive user experiences and tools that streamline labeling tasks for different types of datasets, such as the following:
ArchitecturePowering the tools and workflows in the Kili Technology platform, there is an architecture that works in on-premise and cloud deployments. Here are a few interesting points about the Kili's architecture:
This modern architecture allows Kili Technology to adapt to different infrastructures. The platform can be provisioned in a SaaS model using Google Cloud as well as on hybrid or on-premise models. Kili PlaygroundOne of the most interesting components of the Kili Technology architecture is the Kili Playground. Released as an open-source stack, Kili Playground is a Python library that abstracts the interaction with Kili’s GraphQL API making it easier for data scientists and machine learning engineers to build data labeling processes using a few lines of code. ConclusionThree years ago, the annotation market was embryonic. Today, the first end-to-end data-centric AI platforms are starting to appear. They allow to radically simplify annotation, better control the quality of the training data, iterate faster in training cycles, and ultimately complete AI projects 2-10 times faster. A flexible, modern architecture supporting state-of-the-art data labeling tools and workflows for audio, image, video and text datasets, together with meaningful customer traction and investor backing, are some of the factors making Kili one of the most relevant companies in this new area of machine learning. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
You Might Also Like
Tech in 2024: The winners and losers
Tuesday, November 26, 2024
Retesting AI detectors; Linux support options; Android home screen launchers -- ZDNET ZDNET Tech Today - US November 26, 2024 Meta Ray-Ban Smart Glasses ZDNET Tech winners and losers of 2024: The year
LW 160 - How to Edit Shopify Code So It’s Easy to Update Your Theme Later
Tuesday, November 26, 2024
How to Edit Shopify Code So It's Easy to Update Your Theme Later Shopify Development news and
State of JavaScript 2024; ECMAScript Internationalization API; JS without build system
Tuesday, November 26, 2024
We have 9 links for you - Stay up-to-date on JavaScript and tools Survey: State of JavaScript 2024 (Nov 13 – Dec 3) survey.devographics.com @sachagreif@front-end.social Intl [ECMAScript
The military adopts Claude AI 💪
Tuesday, November 26, 2024
plus, AI doctors get better 🖖 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Post from Syncfusion Blogs on 11/26/2024
Tuesday, November 26, 2024
New blogs from Syncfusion All Things Open 2024 Takeaways, Part 2: Transparency By Marissa Keller Outten Discover the importance of transparency, learn how to build it, and overcome barriers to drive
⚙️ New Nvidia
Tuesday, November 26, 2024
Plus: Study on LLM reasoning
Your First 90 Days as CISO: 15 Steps to Success
Tuesday, November 26, 2024
Essential strategies for a strong start in your new CISO role - get the roadmap now. The Hacker News The First 90 Days as CISO: Your Roadmap to Success The clock starts ticking the moment you step into
Your monthly update has arrived
Tuesday, November 26, 2024
What's new in Google Play and Android Email not displaying correctly? View it online November 2024 The First Developer Preview of Android 16 The First Developer Preview of Android 16 Android 16
RomCom Exploits Zero-Day Firefox and Windows Flaws in Cyberattacks
Tuesday, November 26, 2024
THN Daily Updates Newsletter cover The AI Value Playbook ($35.99) FREE for a Limited Time Business leaders are challenged by the speed of AI innovation and how to navigate disruption and uncertainty.
Edge 451: In One Teacher Enough? Understanding Multi-Teacher Distillation
Tuesday, November 26, 2024
Enhancing the distillation process using more than one teacher. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏