Edge 272: Inside Toolformer, Meta AI New Transformer Learned to Use Tools to Produce Better Answers
Was this email forwarded to you? Sign up here Edge 272: Inside Toolformer, Meta AI New Transformer Learned to Use Tools to Produce Better AnswersThe model mastered the use of tools such as calculators, calendars, or Wikipedia search queries across many downstream tasks.Today’s large language models have made remarkable strides in performing a range of natural language processing tasks, displaying a range of emergent capabilities. However, these models have certain inherent limitations that can only be partially mitigated by increasing their size. These limitations include an inability to access recent events, a tendency to fabricate information, difficulties in processing low-resource languages, a lack of mathematical proficiency, and an ignorance of the passage of time. One promising approach to overcome these limitations is to equip language models with the ability to use external tools such as search engines, calculators, or calendars. However, current solutions either require extensive human annotations or are restricted to specific tasks, hindering wider adoption. A few days ago, Meta AI published a research paper detailing Toolformer, a novel model that learns to use tools in a self-supervised manner without the need for human annotations. Meta AI’s approach with Toolformer is based on the concept of in-context learning and the generation of datasets from scratch. Given just a few examples of how an API can be used, Toolformer annotates a large language modeling dataset with potential API calls. Through a self-supervised loss, the model determines which API calls are useful in predicting future tokens and fine-tunes itself accordingly. With this approach, language models can learn to control a variety of tools and to make informed decisions on when and how to use them. Toolformer allows the model to retain its generality and to independently decide when and how to use various tools, enabling a more comprehensive utilization of tools that is not tied to specific tasks. Inside the Toolformer ArchitectureThe core idea behind Toolformer is to enhance a language model (M) with the ability to use different tools via API calls. The inputs and outputs for each API are represented as text sequences, which enables the integration of API calls into any text using special tokens. For the training, Meta AI used a dataset of API calls represented as a tuple (ac, ic), where ac is the name of the API, and it is the input. Given an API call (ac, ic) with a corresponding result (r), the linearized sequences of the API call without and with the result are denoted as e(ac, ic) and e(ac, ic, r), respectively. The dataset is the first step to convert the dataset of plain texts into an augmented dataset by inserting API calls. This is done in three steps: sampling potential API calls, executing the API calls and filtering the API calls based on their usefulness in predicting future tokens. After filtering the API calls, they are merged and interleaved with the original inputs to form the augmented dataset. The language model is then finetuned on this augmented dataset, allowing it to make its own decisions on when and how to use each tool based on its own feedback. In the inference stage, the model generates text as usual until it encounters the “!” token, indicating the need for an API response. The appropriate API is then called to obtain the response, and the decoding process continues after inserting the response and the </API> token. The researchers are investigating various tools to address the limitations of regular language models (LMs). The only requirements for these tools are that their inputs and outputs can be represented as text sequences and that the researchers can obtain a few examples of how to use them. The five tools being explored are a question-answering system, a Wikipedia search engine, a calculator, a calendar, and a machine translation system. 1. Question Answering System: The question-answering system is based on another LM that can answer simple factual questions. 2. Calculator: The calculator can perform basic arithmetic operations and returns results rounded to two decimal places. 3. Wikipedia Search: The Wikipedia search engine returns short text snippets from Wikipedia based on a search term. 4. Machine Translation: the machine translation system can translate phrases from any language into English. 5. Calendar: The calendar returns the current date without taking any input, providing a temporal context for predictions that require an awareness of time. The Toolformer implementation is based on a finetuned version of GPT-J, which only uses 6.7 billion parameters. The model was able to outperform GPT-3 and GPT-J across several benchmarks. The ideas behind Toolformer represent a new frontier for LLMs in which they are not only able to perform sophisticated language tasks but complement them with access to tools and APIs. Can’t wait to see Meta AI expand on these ideas. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
A Taxonomy to Understand Federated Learning
Tuesday, March 7, 2023
Classifying different types of federated learning methods, Meta AI research about highly scalable and asynchronous federated learning pipelines and Microsoft's FLUTE framework.
ChatGPT and Whisper APIs
Sunday, March 5, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
📝 How is MLOps more than just tools?
Friday, March 3, 2023
Hi there! At TheSequence, we're exploring what MLOps culture looks like across the industry at the start of 2023. A huge variety of tools are available for ML development, but the culture and
Inside Claude: The ChatGPT Competitor that Just Raised Over $1 Billion
Thursday, March 2, 2023
Claude uses an interesting technique called Constitutional AI to enable safer content.
Edge 269: A New Series About Federated Learning
Tuesday, February 28, 2023
Intro to federated learning, the original federated learning and the TensorFlow Federated framework.
You Might Also Like
WP Weekly 221 - Bluesky - WP Assets on CDN, Limit Font Subsets, ACF Pro Now
Monday, November 25, 2024
Read on Website WP Weekly 221 / Bluesky Have you joined Bluesky, like many other WordPress users, a new place for an online social presence? Also in this issue: CrawlWP, Asset Management Framework,
🤳🏻 We Need More High-End Small Phones — Linux Terminal Setup Tips
Sunday, November 24, 2024
Also: Why I Switched From Google Maps to Apple Maps, and More! How-To Geek Logo November 24, 2024 Did You Know Medieval moats didn't just protect castles from invaders approaching over land, but
JSK Daily for Nov 24, 2024
Sunday, November 24, 2024
JSK Daily for Nov 24, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
OpenAI's turbulent early years - Sync #494
Sunday, November 24, 2024
Plus: Anthropic and xAI raise billions of dollars; can a fluffy robot replace a living pet; Chinese reasoning model DeepSeek R1; robot-dog runs full marathon; a $12000 surgery to change eye colour ͏ ͏
Daily Coding Problem: Problem #1618 [Easy]
Sunday, November 24, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Zillow. Let's define a "sevenish" number to be one which is either a power
PD#602 How Netflix Built Self-Healing System to Survive Concurrency Bug
Sunday, November 24, 2024
CPUs were dying, the bug was temporarily un-fixable, and they had no viable path forward
RD#602 What are React Portals?
Sunday, November 24, 2024
A powerful feature that allows rendering components outside their parent component's DOM hierarchy
C#533 What's new in C# 13
Sunday, November 24, 2024
Params collections support, a new Lock type and others
⚙️ Smaller but deeper: Writer’s secret weapon to better AI
Sunday, November 24, 2024
November 24, 2024 | Read Online Ian Krietzberg Good morning. I sat down recently with Waseem Alshikh, the co-founder and CTO of enterprise AI firm Writer. Writer recently made waves with the release of
Sunday Digest | Featuring 'How Often People Go to the Doctor, by Country' 📊
Sunday, November 24, 2024
Every visualization published this week, in one place. Nov 24, 2024 | View Online | Subscribe | VC+ | Download Our App Hello, welcome to your Sunday Digest. This week we visualized the GDP per capita