🏷 Edge#138: Toloka App Services Aims to Make Data Labeling Easier for AI Startups
This is an example of TheSequence Edge, a Premium newsletter that our subscribers receive every Tuesday and Thursday. On Thursdays, we do deep dives into one of the freshest research papers or technology frameworks that is worth your attention. 💥 What’s New in AI: Toloka App Services for AI StartupsFor developing better AI models, one needs a vast amount of labeled data. Acquiring raw (unlabeled) data is relatively simple in this era of information technology, but labeling large volumes of unstructured data is a highly labor-intensive job. Thus, the majority of the time invested in an AI project is allotted to data-centric operations. The success of supervised learning algorithms depends extensively on the amount and quality of labeled data. The data labels help guide the ML model in the right direction such that it can classify unseen samples accurately. The entire data annotation process involves a lot of steps, like tagging, classification, and processing. For example, suppose in a task to aid autonomous driving, a single image captured may contain several objects like people, other vehicles, pavements, crosswalks, traffic lights, etc., which needs to be identified separately. Furthermore, the annotation quality needs to be high; otherwise, faulty data labels will amplify the adverse effects in the ML model training. As data labeling methods keep being increasingly important to the success of machine learning (ML) solutions, we continue to cover some of the interesting platforms in the space. In Edge#107, which was about Crowdsourced vs. Automated vs. Hybrid Data Labeling, we’ve briefly covered Toloka, a crowdsourced data labeling platform that was initially started to fit the needs in large-scale industrial ML pipelines. Today, we want to overview their new service tailored for startups and teams in the early stage of their AI production – Toloka App Services. Let’s dive in. Data Labeling: The challenges for a noviceThere are different ways to attack a data-labeling problem: crowdsourced, automated, or hybrid. The obvious tradeoff is between the scale and systematicity of automated approaches and the accuracy and simplicity of crowdsourced models. Toloka App Services is one of the solutions capitalizing on crowdsourced/hybrid data labeling methods. A general workflow for generating high-quality labeled ML datasets using a crowdsourced approach requires the following steps:
After all these steps, there are two main requirements of data labeling that have to be fulfilled: -Accuracy: This measures how close the data labels are to the ground truth, i.e., checking the consistency in the predictions of an ML algorithm concerning the real world. Accuracy in data labeling is vital for the success of Computer Vision and Natural Language Processing tasks. -Quality: This is the measure of the accuracy of the entire dataset generated. Here you need to check whether the labels tagged by different data labelers reach a consensus to ensure consistency and accuracy of the datasets. Toloka App ServicesAll the above might sound like a daunting task for a team that just started their AI journey and doesn’t want to put a lot of effort into training their first models. Toloka App Services tries to make it less of a headache by taking five out of these seven steps off your plate. First, you need to divide your project into smaller steps (Point 1) and provide instructions to the data labelers (Point 2) Toloka Apps will take care of the actual data labeling operation, allowing you to focus on the analysis and deliver high-quality insights. The solution for engineers provides all the necessary components: the pre-set interfaces, tools for balancing speed/quality ratio, the global crowd, optimal matching of tasks and performers, full range of automated quality control methods, dynamic pricing, and a free API to integrate it into the ML production pipeline. How it worksToloka includes three important components that together help ensure data labeling quality: expert benchmark, crowd input, automation. Expert benchmark: Toloka provides in-house expert labelers who help set a benchmark and create comprehensive guidelines for the consistency of labels. Toloka claims results in a 40% lower error rate than the available managed service solutions in the market. Crowd input: While testing your data, humans should be part of the process for providing ground truth monitoring. Using Humans-in-the-loop (HITL), Toloka lets you check whether your ML model provides the intended prediction and helps you identify gaps in the train data, and give feedback to the model. HITL further allows retraining of the model if the prediction is incorrect or the confidence is below the set threshold. In Toloka, a managed crowd force provides the majority of labels. With millions of crowd performers available across every time zone, Toloka’s algorithms select performers that are best suited for the target task, and the accuracy is monitored thoroughly. It helps dramatically reduce the data labeling project time from several days to just minutes. Automation: Toloka employs auto-labeling solutions to increase the size of the training set, or the quality of labels (by adding an extra vote to each judgment), or the speed of labeling. What is also nice about Toloka Apps is its user-friendly interface that allows you to kickstart your data labeling project without ML expertise. The solution includes pre-set data labeling pipelines available for real industries’ data labeling use cases. For the quality check, Toloka Apps employs pre-trained models and custom-built algorithms that are guaranteed to provide the best quality labels in record time. We also like that you do not need to create and maintain control tasks, set up, and experiment with quality control rules. A fully functional quality control setup is already in place. VersatilityToloka Apps enables a consistent interface for labeling highly diverse types of datasets:
Conclusion:Data labeling is a time-consuming but essential part of any ML project. Any ML solution is only as good as the data that is used to train it. Toloka App Services is one of the emerging data labeling solutions in the market that offers a flexible and scalable data labeling service for a wide range of tasks. Toloka assures the quality of the data labels while also being cost-effective. The pre-set options for startups and high flexibility and customization to play with make Toloka App Services one of the most helpful tools on the market. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📌 Event: MLOps Cocktails Done Right: How to Mix Data Science, ML Engineering, and DevOps*
Wednesday, November 3, 2021
[FREE Virtual Event]
🤓 Edge#137: Self-Supervised Learning Recap
Tuesday, November 2, 2021
As requested by many of our readers, we put together a recap of the SSL series.
🤔🤯 Addressing One of the Fundamental Questions in Machine Learning
Sunday, October 31, 2021
Weekly news digest curated by the industry insiders
🤩 Early access: try the world's most flexible AI cloud*
Friday, October 29, 2021
only for TheSequence readers
Welcome to TheSequence!
Friday, October 29, 2021
Hi there, you're on the free list for our news digest TheSequence Scope. Every Sunday we pick the most relevant ML research papers, cool ML tech releases, and cover important investments in AI.
You Might Also Like
⚙️ r1
Thursday, April 25, 2024
Plus: UK investigating OpenAI
Charted | Economic Growth Forecasts for G7 and BRICS Countries in 2024 📊
Thursday, April 25, 2024
The IMF has released its economic growth forecasts for 2024. How do the G7 and BRICS countries compare in expected real GDP growth? View Online | Subscribe Presented by: Access European benchmarks with
Build5Nines Newsletter - April 25, 2024
Thursday, April 25, 2024
View this email in your browser Build5Nines Build5Nines Newsletter Thank you for subscribing! I look forward to sharing with you the latest cloud news, technical help, and other thoughts around DevOps
Discover the World's Easiest Parallel File System
Thursday, April 25, 2024
Join us in exploring the future of data management with Bjorn Kolbeck, a Google engineer turned CEO and Co-founder of Quobyte, the creators of the world's easiest parallel file system. ͏ ͏ ͏ ͏ ͏ ͏
Issue 314 - New Model 3 Performance is here
Thursday, April 25, 2024
View this email in your browser If you are just now finding out about Tesletter, you can subscribe here! If you already know Tesletter and want to support us, check out our Patreon page Issue 314 - New
Programmer Weekly - Issue 202
Thursday, April 25, 2024
View this email in your browser Programmer Weekly Welcome to issue 202 of Programmer Weekly. Let's get straight to the links this week. Quote of the Week "Computer science inverts the normal.
Python Weekly - Issue 647
Thursday, April 25, 2024
View this email in your browser Python Weekly Welcome to issue 647 of Python Weekly. Let's get straight to the links this week. From Our Sponsor Get Your Weekly Dose of Programming A weekly
Web Tools #562 - Voilà Review, CSS Tools, Media, React Native
Thursday, April 25, 2024
WEB VERSION Issue #562 • April 25, 2024 The following is a paid product review for Voilà, an AI assistant for the browser that enables you to improve your writing, coding, brainstorming, and research
Everyone wants to build the AI dev tool of the future
Thursday, April 25, 2024
A new startup called Augment has raised north of $250 million to build AI-powered dev tools. View this email online in your browser By Alex Wilhelm Thursday, April 25, 2024 Welcome to TechCrunch AM!
7 reasons to use Copilot over ChatGPT
Thursday, April 25, 2024
Coros Vertex 2S; Top 5 news apps; New Yeedi M12 Pro+ -- ZDNET ZDNET Tech Today - US April 25, 2024 placeholder 7 reasons I use Copilot instead of ChatGPT I reach for Copilot every day, and here's