🏷 Edge#138: Toloka App Services Aims to Make Data Labeling Easier for AI Startups
This is an example of TheSequence Edge, a Premium newsletter that our subscribers receive every Tuesday and Thursday. On Thursdays, we do deep dives into one of the freshest research papers or technology frameworks that is worth your attention. 💥 What’s New in AI: Toloka App Services for AI StartupsFor developing better AI models, one needs a vast amount of labeled data. Acquiring raw (unlabeled) data is relatively simple in this era of information technology, but labeling large volumes of unstructured data is a highly labor-intensive job. Thus, the majority of the time invested in an AI project is allotted to data-centric operations. The success of supervised learning algorithms depends extensively on the amount and quality of labeled data. The data labels help guide the ML model in the right direction such that it can classify unseen samples accurately. The entire data annotation process involves a lot of steps, like tagging, classification, and processing. For example, suppose in a task to aid autonomous driving, a single image captured may contain several objects like people, other vehicles, pavements, crosswalks, traffic lights, etc., which needs to be identified separately. Furthermore, the annotation quality needs to be high; otherwise, faulty data labels will amplify the adverse effects in the ML model training. As data labeling methods keep being increasingly important to the success of machine learning (ML) solutions, we continue to cover some of the interesting platforms in the space. In Edge#107, which was about Crowdsourced vs. Automated vs. Hybrid Data Labeling, we’ve briefly covered Toloka, a crowdsourced data labeling platform that was initially started to fit the needs in large-scale industrial ML pipelines. Today, we want to overview their new service tailored for startups and teams in the early stage of their AI production – Toloka App Services. Let’s dive in. Data Labeling: The challenges for a noviceThere are different ways to attack a data-labeling problem: crowdsourced, automated, or hybrid. The obvious tradeoff is between the scale and systematicity of automated approaches and the accuracy and simplicity of crowdsourced models. Toloka App Services is one of the solutions capitalizing on crowdsourced/hybrid data labeling methods. A general workflow for generating high-quality labeled ML datasets using a crowdsourced approach requires the following steps:
After all these steps, there are two main requirements of data labeling that have to be fulfilled: -Accuracy: This measures how close the data labels are to the ground truth, i.e., checking the consistency in the predictions of an ML algorithm concerning the real world. Accuracy in data labeling is vital for the success of Computer Vision and Natural Language Processing tasks. -Quality: This is the measure of the accuracy of the entire dataset generated. Here you need to check whether the labels tagged by different data labelers reach a consensus to ensure consistency and accuracy of the datasets. Toloka App ServicesAll the above might sound like a daunting task for a team that just started their AI journey and doesn’t want to put a lot of effort into training their first models. Toloka App Services tries to make it less of a headache by taking five out of these seven steps off your plate. First, you need to divide your project into smaller steps (Point 1) and provide instructions to the data labelers (Point 2) Toloka Apps will take care of the actual data labeling operation, allowing you to focus on the analysis and deliver high-quality insights. The solution for engineers provides all the necessary components: the pre-set interfaces, tools for balancing speed/quality ratio, the global crowd, optimal matching of tasks and performers, full range of automated quality control methods, dynamic pricing, and a free API to integrate it into the ML production pipeline. How it worksToloka includes three important components that together help ensure data labeling quality: expert benchmark, crowd input, automation. Expert benchmark: Toloka provides in-house expert labelers who help set a benchmark and create comprehensive guidelines for the consistency of labels. Toloka claims results in a 40% lower error rate than the available managed service solutions in the market. Crowd input: While testing your data, humans should be part of the process for providing ground truth monitoring. Using Humans-in-the-loop (HITL), Toloka lets you check whether your ML model provides the intended prediction and helps you identify gaps in the train data, and give feedback to the model. HITL further allows retraining of the model if the prediction is incorrect or the confidence is below the set threshold. In Toloka, a managed crowd force provides the majority of labels. With millions of crowd performers available across every time zone, Toloka’s algorithms select performers that are best suited for the target task, and the accuracy is monitored thoroughly. It helps dramatically reduce the data labeling project time from several days to just minutes. Automation: Toloka employs auto-labeling solutions to increase the size of the training set, or the quality of labels (by adding an extra vote to each judgment), or the speed of labeling. What is also nice about Toloka Apps is its user-friendly interface that allows you to kickstart your data labeling project without ML expertise. The solution includes pre-set data labeling pipelines available for real industries’ data labeling use cases. For the quality check, Toloka Apps employs pre-trained models and custom-built algorithms that are guaranteed to provide the best quality labels in record time. We also like that you do not need to create and maintain control tasks, set up, and experiment with quality control rules. A fully functional quality control setup is already in place. VersatilityToloka Apps enables a consistent interface for labeling highly diverse types of datasets:
Conclusion:Data labeling is a time-consuming but essential part of any ML project. Any ML solution is only as good as the data that is used to train it. Toloka App Services is one of the emerging data labeling solutions in the market that offers a flexible and scalable data labeling service for a wide range of tasks. Toloka assures the quality of the data labels while also being cost-effective. The pre-set options for startups and high flexibility and customization to play with make Toloka App Services one of the most helpful tools on the market. |
Older messages
📌 Event: MLOps Cocktails Done Right: How to Mix Data Science, ML Engineering, and DevOps*
Wednesday, November 3, 2021
[FREE Virtual Event]
🤓 Edge#137: Self-Supervised Learning Recap
Tuesday, November 2, 2021
As requested by many of our readers, we put together a recap of the SSL series.
🤔🤯 Addressing One of the Fundamental Questions in Machine Learning
Sunday, October 31, 2021
Weekly news digest curated by the industry insiders
🤩 Early access: try the world's most flexible AI cloud*
Friday, October 29, 2021
only for TheSequence readers
Welcome to TheSequence!
Friday, October 29, 2021
Hi there, you're on the free list for our news digest TheSequence Scope. Every Sunday we pick the most relevant ML research papers, cool ML tech releases, and cover important investments in AI.
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your