🏷 Edge#138: Toloka App Services Aims to Make Data Labeling Easier for AI Startups
This is an example of TheSequence Edge, a Premium newsletter that our subscribers receive every Tuesday and Thursday. On Thursdays, we do deep dives into one of the freshest research papers or technology frameworks that is worth your attention. 💥 What’s New in AI: Toloka App Services for AI StartupsFor developing better AI models, one needs a vast amount of labeled data. Acquiring raw (unlabeled) data is relatively simple in this era of information technology, but labeling large volumes of unstructured data is a highly labor-intensive job. Thus, the majority of the time invested in an AI project is allotted to data-centric operations. The success of supervised learning algorithms depends extensively on the amount and quality of labeled data. The data labels help guide the ML model in the right direction such that it can classify unseen samples accurately. The entire data annotation process involves a lot of steps, like tagging, classification, and processing. For example, suppose in a task to aid autonomous driving, a single image captured may contain several objects like people, other vehicles, pavements, crosswalks, traffic lights, etc., which needs to be identified separately. Furthermore, the annotation quality needs to be high; otherwise, faulty data labels will amplify the adverse effects in the ML model training. As data labeling methods keep being increasingly important to the success of machine learning (ML) solutions, we continue to cover some of the interesting platforms in the space. In Edge#107, which was about Crowdsourced vs. Automated vs. Hybrid Data Labeling, we’ve briefly covered Toloka, a crowdsourced data labeling platform that was initially started to fit the needs in large-scale industrial ML pipelines. Today, we want to overview their new service tailored for startups and teams in the early stage of their AI production – Toloka App Services. Let’s dive in. Data Labeling: The challenges for a noviceThere are different ways to attack a data-labeling problem: crowdsourced, automated, or hybrid. The obvious tradeoff is between the scale and systematicity of automated approaches and the accuracy and simplicity of crowdsourced models. Toloka App Services is one of the solutions capitalizing on crowdsourced/hybrid data labeling methods. A general workflow for generating high-quality labeled ML datasets using a crowdsourced approach requires the following steps:
After all these steps, there are two main requirements of data labeling that have to be fulfilled: -Accuracy: This measures how close the data labels are to the ground truth, i.e., checking the consistency in the predictions of an ML algorithm concerning the real world. Accuracy in data labeling is vital for the success of Computer Vision and Natural Language Processing tasks. -Quality: This is the measure of the accuracy of the entire dataset generated. Here you need to check whether the labels tagged by different data labelers reach a consensus to ensure consistency and accuracy of the datasets. Toloka App ServicesAll the above might sound like a daunting task for a team that just started their AI journey and doesn’t want to put a lot of effort into training their first models. Toloka App Services tries to make it less of a headache by taking five out of these seven steps off your plate. First, you need to divide your project into smaller steps (Point 1) and provide instructions to the data labelers (Point 2) Toloka Apps will take care of the actual data labeling operation, allowing you to focus on the analysis and deliver high-quality insights. The solution for engineers provides all the necessary components: the pre-set interfaces, tools for balancing speed/quality ratio, the global crowd, optimal matching of tasks and performers, full range of automated quality control methods, dynamic pricing, and a free API to integrate it into the ML production pipeline. How it worksToloka includes three important components that together help ensure data labeling quality: expert benchmark, crowd input, automation. Expert benchmark: Toloka provides in-house expert labelers who help set a benchmark and create comprehensive guidelines for the consistency of labels. Toloka claims results in a 40% lower error rate than the available managed service solutions in the market. Crowd input: While testing your data, humans should be part of the process for providing ground truth monitoring. Using Humans-in-the-loop (HITL), Toloka lets you check whether your ML model provides the intended prediction and helps you identify gaps in the train data, and give feedback to the model. HITL further allows retraining of the model if the prediction is incorrect or the confidence is below the set threshold. In Toloka, a managed crowd force provides the majority of labels. With millions of crowd performers available across every time zone, Toloka’s algorithms select performers that are best suited for the target task, and the accuracy is monitored thoroughly. It helps dramatically reduce the data labeling project time from several days to just minutes. Automation: Toloka employs auto-labeling solutions to increase the size of the training set, or the quality of labels (by adding an extra vote to each judgment), or the speed of labeling. What is also nice about Toloka Apps is its user-friendly interface that allows you to kickstart your data labeling project without ML expertise. The solution includes pre-set data labeling pipelines available for real industries’ data labeling use cases. For the quality check, Toloka Apps employs pre-trained models and custom-built algorithms that are guaranteed to provide the best quality labels in record time. We also like that you do not need to create and maintain control tasks, set up, and experiment with quality control rules. A fully functional quality control setup is already in place. VersatilityToloka Apps enables a consistent interface for labeling highly diverse types of datasets:
Conclusion:Data labeling is a time-consuming but essential part of any ML project. Any ML solution is only as good as the data that is used to train it. Toloka App Services is one of the emerging data labeling solutions in the market that offers a flexible and scalable data labeling service for a wide range of tasks. Toloka assures the quality of the data labels while also being cost-effective. The pre-set options for startups and high flexibility and customization to play with make Toloka App Services one of the most helpful tools on the market. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📌 Event: MLOps Cocktails Done Right: How to Mix Data Science, ML Engineering, and DevOps*
Wednesday, November 3, 2021
[FREE Virtual Event]
🤓 Edge#137: Self-Supervised Learning Recap
Tuesday, November 2, 2021
As requested by many of our readers, we put together a recap of the SSL series.
🤔🤯 Addressing One of the Fundamental Questions in Machine Learning
Sunday, October 31, 2021
Weekly news digest curated by the industry insiders
🤩 Early access: try the world's most flexible AI cloud*
Friday, October 29, 2021
only for TheSequence readers
Welcome to TheSequence!
Friday, October 29, 2021
Hi there, you're on the free list for our news digest TheSequence Scope. Every Sunday we pick the most relevant ML research papers, cool ML tech releases, and cover important investments in AI.
You Might Also Like
Post from Syncfusion Blogs on 11/26/2024
Tuesday, November 26, 2024
New blogs from Syncfusion All Things Open 2024 Takeaways, Part 2: Transparency By Marissa Keller Outten Discover the importance of transparency, learn how to build it, and overcome barriers to drive
⚙️ New Nvidia
Tuesday, November 26, 2024
Plus: Study on LLM reasoning
Your First 90 Days as CISO: 15 Steps to Success
Tuesday, November 26, 2024
Essential strategies for a strong start in your new CISO role - get the roadmap now. The Hacker News The First 90 Days as CISO: Your Roadmap to Success The clock starts ticking the moment you step into
Your monthly update has arrived
Tuesday, November 26, 2024
What's new in Google Play and Android Email not displaying correctly? View it online November 2024 The First Developer Preview of Android 16 The First Developer Preview of Android 16 Android 16
RomCom Exploits Zero-Day Firefox and Windows Flaws in Cyberattacks
Tuesday, November 26, 2024
THN Daily Updates Newsletter cover The AI Value Playbook ($35.99) FREE for a Limited Time Business leaders are challenged by the speed of AI innovation and how to navigate disruption and uncertainty.
Edge 451: In One Teacher Enough? Understanding Multi-Teacher Distillation
Tuesday, November 26, 2024
Enhancing the distillation process using more than one teacher. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Software Testing Weekly - Issue 247
Tuesday, November 26, 2024
QA Job Hunting Resources 📚 View on the Web Archives ISSUE 247 November 26th 2024 COMMENT Welcome to the 247th issue! Today, I'd like to highlight a fantastic set of QA Job Hunting Resources.
🔒 The Vault Newsletter: November issue 🔑
Monday, November 25, 2024
Get the latest business security news, updates, and advice from 1Password. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🧐 The Most Interesting Phones You Didn't See in 2024 — Making Reddit Faster on Older Devices
Monday, November 25, 2024
Also: Best Black Friday Deals So Far, and More! How-To Geek Logo November 25, 2024 Did You Know If you look closely over John Lennon's shoulder on the iconic cover of The Beatles Abbey Road album,
JSK Daily for Nov 25, 2024
Monday, November 25, 2024
JSK Daily for Nov 25, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted