Edge 272: Inside Toolformer, Meta AI New Transformer Learned to Use Tools to Produce Better Answers
Was this email forwarded to you? Sign up here Edge 272: Inside Toolformer, Meta AI New Transformer Learned to Use Tools to Produce Better AnswersThe model mastered the use of tools such as calculators, calendars, or Wikipedia search queries across many downstream tasks.Today’s large language models have made remarkable strides in performing a range of natural language processing tasks, displaying a range of emergent capabilities. However, these models have certain inherent limitations that can only be partially mitigated by increasing their size. These limitations include an inability to access recent events, a tendency to fabricate information, difficulties in processing low-resource languages, a lack of mathematical proficiency, and an ignorance of the passage of time. One promising approach to overcome these limitations is to equip language models with the ability to use external tools such as search engines, calculators, or calendars. However, current solutions either require extensive human annotations or are restricted to specific tasks, hindering wider adoption. A few days ago, Meta AI published a research paper detailing Toolformer, a novel model that learns to use tools in a self-supervised manner without the need for human annotations. Meta AI’s approach with Toolformer is based on the concept of in-context learning and the generation of datasets from scratch. Given just a few examples of how an API can be used, Toolformer annotates a large language modeling dataset with potential API calls. Through a self-supervised loss, the model determines which API calls are useful in predicting future tokens and fine-tunes itself accordingly. With this approach, language models can learn to control a variety of tools and to make informed decisions on when and how to use them. Toolformer allows the model to retain its generality and to independently decide when and how to use various tools, enabling a more comprehensive utilization of tools that is not tied to specific tasks. Inside the Toolformer ArchitectureThe core idea behind Toolformer is to enhance a language model (M) with the ability to use different tools via API calls. The inputs and outputs for each API are represented as text sequences, which enables the integration of API calls into any text using special tokens. For the training, Meta AI used a dataset of API calls represented as a tuple (ac, ic), where ac is the name of the API, and it is the input. Given an API call (ac, ic) with a corresponding result (r), the linearized sequences of the API call without and with the result are denoted as e(ac, ic) and e(ac, ic, r), respectively. The dataset is the first step to convert the dataset of plain texts into an augmented dataset by inserting API calls. This is done in three steps: sampling potential API calls, executing the API calls and filtering the API calls based on their usefulness in predicting future tokens. After filtering the API calls, they are merged and interleaved with the original inputs to form the augmented dataset. The language model is then finetuned on this augmented dataset, allowing it to make its own decisions on when and how to use each tool based on its own feedback. In the inference stage, the model generates text as usual until it encounters the “!” token, indicating the need for an API response. The appropriate API is then called to obtain the response, and the decoding process continues after inserting the response and the </API> token. The researchers are investigating various tools to address the limitations of regular language models (LMs). The only requirements for these tools are that their inputs and outputs can be represented as text sequences and that the researchers can obtain a few examples of how to use them. The five tools being explored are a question-answering system, a Wikipedia search engine, a calculator, a calendar, and a machine translation system. 1. Question Answering System: The question-answering system is based on another LM that can answer simple factual questions. 2. Calculator: The calculator can perform basic arithmetic operations and returns results rounded to two decimal places. 3. Wikipedia Search: The Wikipedia search engine returns short text snippets from Wikipedia based on a search term. 4. Machine Translation: the machine translation system can translate phrases from any language into English. 5. Calendar: The calendar returns the current date without taking any input, providing a temporal context for predictions that require an awareness of time. The Toolformer implementation is based on a finetuned version of GPT-J, which only uses 6.7 billion parameters. The model was able to outperform GPT-3 and GPT-J across several benchmarks. The ideas behind Toolformer represent a new frontier for LLMs in which they are not only able to perform sophisticated language tasks but complement them with access to tools and APIs. Can’t wait to see Meta AI expand on these ideas. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Key phrases
Older messages
A Taxonomy to Understand Federated Learning
Tuesday, March 7, 2023
Classifying different types of federated learning methods, Meta AI research about highly scalable and asynchronous federated learning pipelines and Microsoft's FLUTE framework.
ChatGPT and Whisper APIs
Sunday, March 5, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
📝 How is MLOps more than just tools?
Friday, March 3, 2023
Hi there! At TheSequence, we're exploring what MLOps culture looks like across the industry at the start of 2023. A huge variety of tools are available for ML development, but the culture and
Inside Claude: The ChatGPT Competitor that Just Raised Over $1 Billion
Thursday, March 2, 2023
Claude uses an interesting technique called Constitutional AI to enable safer content.
Edge 269: A New Series About Federated Learning
Tuesday, February 28, 2023
Intro to federated learning, the original federated learning and the TensorFlow Federated framework.
You Might Also Like
Data Science Weekly - Issue 543
Friday, April 19, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
A deal made in cloud security heaven
Thursday, April 18, 2024
Meta's Llama 3 goes public and hackers hold World-Check data for ransom View this email online in your browser By Christine Hall Thursday, April 18, 2024 Welcome to TechCrunch PM! I'm glad you
💎 Issue 413 - RubyJS-Vite
Thursday, April 18, 2024
This week's Awesome Ruby Newsletter Read this email on the Web The Awesome Ruby Newsletter Issue » 413 Release Date Apr 18, 2024 Your weekly report of the most popular Ruby news, articles and
💻 Issue 406 - Swift for C++ Practitioners, Part 1
Thursday, April 18, 2024
This week's Awesome .NET Weekly Read this email on the Web The Awesome .NET Weekly Issue » 406 Release Date Apr 18, 2024 Your weekly report of the most popular .NET news, articles and projects
💻 Issue 413 - How to implement HLS Video Streaming in a React App
Thursday, April 18, 2024
This week's Awesome Node.js Weekly Read this email on the Web The Awesome Node.js Weekly Issue » 413 Release Date Apr 18, 2024 Your weekly report of the most popular Node.js news, articles and
📱 Issue 407 - Textual Healing: iOS Text Editing Minutiae
Thursday, April 18, 2024
This week's Awesome iOS Weekly Read this email on the Web The Awesome iOS Weekly Issue » 407 Release Date Apr 18, 2024 Your weekly report of the most popular iOS news, articles and projects Popular
💻 Issue 413 - Interview with Senior JavaScript Developer 2024 [video]
Thursday, April 18, 2024
This week's Awesome JavaScript Weekly Read this email on the Web The Awesome JavaScript Weekly Issue » 413 Release Date Apr 18, 2024 Your weekly report of the most popular JavaScript news, articles
💻 Issue 331 - 30+ app ideas with complete source code
Thursday, April 18, 2024
This week's Awesome React Weekly Read this email on the Web The Awesome React Weekly Issue » 331 Release Date Apr 18, 2024 Your weekly report of the most popular React news, articles and projects
💻 Issue 408 - Curl: Hyper, is it worth it?
Thursday, April 18, 2024
This week's Awesome Rust Weekly Read this email on the Web The Awesome Rust Weekly Issue » 408 Release Date Apr 18, 2024 Your weekly report of the most popular Rust news, articles and projects
📱 Issue 410 - Swift for C++ Practitioners, Part 1
Thursday, April 18, 2024
This week's Awesome Swift Weekly Read this email on the Web The Awesome Swift Weekly Issue » 410 Release Date Apr 18, 2024 Your weekly report of the most popular Swift news, articles and projects