📝 Guest Post: Adala – The First Open Source Data-Labeling Agent*
Was this email forwarded to you? Sign up here In this guest post, Jimmy Whitaker, Data Scientist in Residence at Human Signal, introduces Adala, an Autonomous Data Labeling Agent framework that harmonizes AI's computational power with human judgment. It challenges conventional AI paradigms with dynamic agents that not only learn and adapt but also heavily rely on human-labeled data as a foundational bedrock. While the technology is still in its nascent stages, it aims to redefine data processing, model training, and the development of AI applications. It’s open-source and you can contribute to it! Generative AI, exemplified by Large Language Models (LLMs) like ChatGPT, has moved beyond being a mere tool of natural-language chat to become an active collaborator in our day-to-day work. The concept of building agents with LLMs as their core runtime, or compute primitive, has emerged as a groundbreaking approach. These LLM-powered agents are not limited to generating well-articulated content but can be framed as potent general problem solvers. Yet, one might assume that with such progress, the need for traditional tasks like data labeling would diminish, with agents and other generative AI techniques picking up that burden. However, these systems still need to deliver correct and consistent results. Consequently, LLM-based systems still need to be guided by human insight, especially when it comes to domain-specific, or more complex or nuanced tasks, to ensure the quality, reliability and robustness of AI models. Enter Adala, a new Autonomous Data Labeling Agent framework. While it embodies the forefront of AI-driven data processing, Adala acknowledges and emphasizes the irreplaceable role of human-labeled data. It's not about replacing the human touch but harmonizing it with AI capabilities. While the technology behind the open source Adala framework is still early, we’re excited for you to explore the project, provide feedback, and contribute back. We believe Adala has the potential to reshape the landscape of data processing, model training, fine-tuning, and building AI applications. To help understand why, let’s dive into the architecture and how you would use Adala to continuously train and fine-tune AI models. Adala - A Data Labeling Agent FrameworkAt the heart of Adala lies a philosophy that challenges traditional AI paradigms. Unlike systems bound by static algorithms and predefined rules, Adala's agents are dynamic entities designed to learn, adapt, and evolve. This evolution is not random but is guided by experiences, data, and, most importantly, human feedback.
Agents are profoundly influenced by the context set for them. Typically, this context takes the shape of the prompts and skills provided to them. In the case of Adala, this context is primarily provided by a ground truth dataset. Such datasets, which can be created using platforms like Label Studio, serve as a foundational bedrock, guiding the agent's initial understanding and subsequent learning trajectories. As the environment evolves, perhaps by incorporating new ground truth data, agents further refine their skills, ensuring they remain relevant and accurate. Let’s look at an example. For a simple classification problem, we may create an agent with a “classification skill” to perform subjectivity detection. Initially, this skill may look as simple as defining instructions to retrieve labels from an LLM.
This skill may be sufficient for simple tasks, but it will likely miss the nuance in more complex examples. At this point, we could manually incorporate techniques like Chain of thought or ReAct. But a better way is to ground the skill with a ground truth dataset. During this process, the agent uses an evaluation stage to compute metrics for the ground truth data and automatically incorporates nuanced examples into the skill as a form of few-shot learning to improve the agent’s classification predictions. We can see an improved skill prompt learned from our ground truth data below.
The key here is that we leverage human input to direct the agent’s learning process and constrain the agent’s direction. The agent's learning process is controlled and refined by incorporating human feedback. Human annotators can further enhance the data curated by the agent, focusing on predictions that demand greater discernment and feeding this refined data back into the agent's learning cycle. The Building Blocks of Adala: Flexibility and Extensibility at its CoreThe strength of Adala lies in its modular architecture. Whether you’re looking to process data, generate synthetic data, or curate multimodal datasets, Adala provides the tools and framework to make it happen. At the core of Adala are three fundamental components: the environment, the agent, and the runtime. The Environment: Defined by DataThe environment in Adala is akin to the real-world context in which the agent operates. It supplies the essential data and sets the boundaries for the agent's operations. Crucially, this is the realm where human feedback is integrated, ensuring the agent operates with a clear and relevant context. The Agent: The Heart of AdalaThe agent is where the magic happens. It processes data, learns from it, and refines its actions based on environmental interactions. A standout feature of Adala's agents is their capability to craft custom skills. For instance, a classification skill can assess a dataset, scrutinize ground truth data, and enhance its performance based on human feedback. The versatility of these skills means they can be expanded to handle complex tasks, from intricate data curation to integrating student-teacher architectures or even tools tailored for computer vision. Additionally, agents can be equipped with a memory component, enabling them to tackle more advanced tasks. The Runtime: Powering Adala's OperationsThe runtime, or the LLM where the code executes, is the engine that drives Adala. It's the platform where the agent's skills come to life, ensuring Adala's seamless and efficient operation. Today, Adala supports OpenAI, but we are actively working to support more runtimes and are also seeking contributions to expand the library of available runtimes. The runtimes are designed for adaptability, allowing for integrating new tools, plugins, and features. Fundamentally, Adala's runtime is crafted to effortlessly incorporate into existing workflows, ensuring a smooth fit within data processing pipelines. Adala - Pioneering the Future of Human-AI CollaborationThe quest for efficiency and quality often comes with a hefty price tag. Traditional data processing methods, while effective, can be resource-intensive and costly. Adala’s vision is compelling: a future where AI doesn't replace humans but collaborates with them. A future where the computational prowess of AI agents and the nuanced understanding of humans come together in perfect harmony, ensuring outputs that are efficient, cost-effective, and of the highest quality. The journey for Adala is just beginning. The potential of this technology is yet to be fully realized, and we can all contribute to shaping what’s possible. Explore Adala, and join us virtually on Nov 7th for a live demo and overview of Adala’s agent-based approach to data labeling. *This post was written by Jimmy Whitaker, Data Scientist in Residence at Human Signal. We thank HumanSignal for their insights and ongoing support of TheSequence.You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Generative AI is the New Wall Street Earnings Kingmaker, and Microsoft is the New Earnings King
Sunday, October 29, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
📝 Guest Post: The Coming Wave of Specialized AI*
Friday, October 27, 2023
In this blogpost, Piero Molino, author of Ludwig.ai, CSO & Co-Founder at Predibase, will make the case for why smaller, faster, fine-tuned LLMs are poised to take over large, general AI models.
Edge 338: Inside WebAgent: Google DeepMind's Instruction-Tuned LLM that can Complete Tasks on Websites
Thursday, October 26, 2023
The model combines language understanding and web navigation.
Edge 337: Understanding QLoRA
Tuesday, October 24, 2023
How a simple and effective optimization on LoRA resulted in an incredibly efficient fine-tuning method.
📝 Guest Post: LLMs & humans: The perfect duo for data labeling
Monday, October 23, 2023
How to build a pipeline to achieve superhuman quality
You Might Also Like
Weekend Reading — More time to write
Sunday, November 24, 2024
More Time to Write A fully functional clock that ticks backwards, giving you more time to write. Tech Stuff Martijn Faassen (FWIW I don't know how to use any debugger other than console.log) People
🕹️ Retro Consoles Worth Collecting While You Still Can — Is Last Year's Flagship Phone Worth Your Money?
Saturday, November 23, 2024
Also: Best Outdoor Smart Plugs, and More! How-To Geek Logo November 23, 2024 Did You Know After the "flair" that servers wore—buttons and other adornments—was made the butt of a joke in the
JSK Daily for Nov 23, 2024
Saturday, November 23, 2024
JSK Daily for Nov 23, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Not Ready For The Camera 📸
Saturday, November 23, 2024
What (and who) video-based social media leaves out. Here's a version for your browser. Hunting for the end of the long tail • November 23, 2024 Not Ready For The Camera Why hasn't video
Daily Coding Problem: Problem #1617 [Easy]
Saturday, November 23, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. You are given an string representing the initial conditions of some dominoes.
Ranked | The Tallest and Shortest Countries, by Average Height 📏
Saturday, November 23, 2024
These two maps compare the world's tallest countries, and the world's shortest countries, by average height. View Online | Subscribe | Download Our App TIME IS RUNNING OUT There's just 3
⚙️ Your own Personal AI Agent, for Everything
Saturday, November 23, 2024
November 23, 2024 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Convergence. Imagine if you had a digital
Educational Byte: Are Privacy Coins Like Monero and Zcash Legal?
Saturday, November 23, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 23, 2024? The HackerNoon
🐍 New Python tutorials on Real Python
Saturday, November 23, 2024
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Black Friday Giveaway @ Real Python This Black
Re: Hackers may have stolen everyone's SSN!
Saturday, November 23, 2024
I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for