Dr. Joseph Gonzalez, UC Berkeley: Creating Gorilla and Language Models that Can Call APIs
Was this email forwarded to you? Sign up here Dr. Joseph Gonzalez, UC Berkeley: Creating Gorilla and Language Models that Can Call APIsThe team behind Gorilla discusses the process of creating the model and the state of API-augmented language models.This interview includes answers from Dr. Gonzalez’s students Shishir G. Patil*, and Tianjun Zhang who participated in the creation of the Gorilla model. Quick bio
I am a Professor in the EECS department at UC Berkeley, a co-director and founding member of the UC Berkeley Sky Computing Lab, and RISE Lab and a member of the Berkeley AI Research (BAIR Group). I work on research spanning machine learning and systems. I am co-founder at Aqueduct and prior to joining Berkeley, I co-founded Turi Inc. (acquired by Apple), which was based on my thesis work. 🛠 ML Work
We started the Gorilla project with the goal of radically extending the tool capabilities of LLMs. However, as the project evolved, we realized we were working on something even bigger. The vision of Gorilla is to provide an LLM interface to the world. Today, we rely on a web browser to discover and use services. Tomorrow, AI technology will extend and maybe even replace the browser as our interface to the world. Through conversations with persistent context, LLMs will discover the right services and take the correct actions to help us both complete tasks and even understand the scope of what we can accomplish. For example, next week I am traveling to give a talk and Gorilla could examine my schedule, remind me this week, and notice that I still haven’t booked a rental car. It could find a discount EV rental service based on my preferences and perhaps even plan a roadtrip over the weekend.
Shishir: Gorilla is an LLM that is trained to be able to write API calls accurately. We are able to do this due to an innovative training recipe that we call RAT - Retriever Aware Training. In RAT, we train the LLM to be aware that the prompt is in-fact augmented by a retriever, which allows us to treat the retrieved data differently from just the user prompt. What makes API calls unique is that APIs are extremely brittle, even a single spelling error can lead to an error. Hence, it is a significantly more challenging task than just text/code/image generation.
Shishir: Well, we don’t know for sure, given the other models are closed-source :) But our best guess is that a few factors helped. First, like I mentioned, I think our RAT (retriever aware training) really shines when it comes to writing APIs. Second, introducing the ability to measure hallucination, something we can do with APIs, gave us a standing to actually compare and refine techniques. Third, the ability to make the LLM focus on writing just API calls definitely helped.
Tianjun: We used a technique called Self-Instruct, a simple idea where you can use the LLM itself to generate questions and their corresponding answers. We know LLMs today are really good at coming up with answers/solutions, and from the paper, it also seemed to be good at generating questions. It turns out by showing a few examples of question-answer pairs, the model is already good at coming up with instructions. For better quality, we also manually edit the questions and answers to make them more robust.
Tianjun: The ToolFormer is a great demonstration of tool use in LLM, but it only demonstrates the capability in a very narrow domain: using tools like calculator, wikipedia, etc., to answer a specific question. It also focuses on ~20 API calls, which is far less than Gorilla (dealing with 3000+ API calls). We also build an extensive evaluation benchmark on these calls, rather than focusing on question answering. From our point of view, the solutions proposed by Gorilla and Toolformer are different mainly because they look at the problem from a different scale and perspective. 💥 Miscellaneous – a set of rapid-fire questions
I am still very excited about the work being done in computer vision and multi-modal models as well as a lot of the more basic work connecting machine learning to various data sources (e.g., Feature Stores and Vector Stores). We are also looking at how to better serve models by playing with different tradeoffs in latency and throughput.
This is an important question and I don’t yet know where things will head. When some of the first set of open source LLMs for chatting came about, a lot of people started to think that the open-source LLMs would dominate the big commercial LLM providers. This hasn’t happened. For general reasoning, it is challenging to beat state-of-the-art commercial offerings. I think in the future we will see lots of open-source specialized models that perform certain tasks well (or at least good enough). Yet, just like web search, I still imagine there will be a few hosted LLMs that people will use everyday. This is because building, maintaining, and delivering LLM technology requires significant capital investments in people, data, and technology.
There is a major open question about how we balance retrieval augmented generation with fine-tuning to incorporate domain knowledge. There are strengths and weaknesses to both approaches. Retrieval is limited by the quality of retrieved results as well as our LLMs ability to deal with distracting content. Fine-tuning has the challenge of potentially requiring many models and it is not yet clear how much to fine-tune or the consequences of fine-tuning on the underlying model abilities.
I suspect this will quickly become a dominant focus of LLM research and commercial applications of LLMs. Being able to use tools and web services will make LLM technology significantly more powerful and useful. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Understanding ReAct(Reason + Act) in LLMs
Tuesday, July 25, 2023
Language models that can both reason and make decisions.
The Llama 2 Effect
Sunday, July 23, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
How OpenAI Uses GPT-4 to Interpret the Functions of Neurons in Other Language Models
Thursday, July 20, 2023
A new interpretability method based on GPT-4 can derive explanations about specific neurons in LLMs.
Luca Beurer-Kellner: ETH Zürich, Creator, Language Model Query Language,
Wednesday, July 19, 2023
LMQL, language model programming and the future of LLMs.
Edge 309: What is Active Prompting?
Tuesday, July 18, 2023
Understanding one of the most effective techniques to improve the effectiveness of prompts in LLM applications.
You Might Also Like
Weekend Reading — More time to write
Sunday, November 24, 2024
More Time to Write A fully functional clock that ticks backwards, giving you more time to write. Tech Stuff Martijn Faassen (FWIW I don't know how to use any debugger other than console.log) People
🕹️ Retro Consoles Worth Collecting While You Still Can — Is Last Year's Flagship Phone Worth Your Money?
Saturday, November 23, 2024
Also: Best Outdoor Smart Plugs, and More! How-To Geek Logo November 23, 2024 Did You Know After the "flair" that servers wore—buttons and other adornments—was made the butt of a joke in the
JSK Daily for Nov 23, 2024
Saturday, November 23, 2024
JSK Daily for Nov 23, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Not Ready For The Camera 📸
Saturday, November 23, 2024
What (and who) video-based social media leaves out. Here's a version for your browser. Hunting for the end of the long tail • November 23, 2024 Not Ready For The Camera Why hasn't video
Daily Coding Problem: Problem #1617 [Easy]
Saturday, November 23, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. You are given an string representing the initial conditions of some dominoes.
Ranked | The Tallest and Shortest Countries, by Average Height 📏
Saturday, November 23, 2024
These two maps compare the world's tallest countries, and the world's shortest countries, by average height. View Online | Subscribe | Download Our App TIME IS RUNNING OUT There's just 3
⚙️ Your own Personal AI Agent, for Everything
Saturday, November 23, 2024
November 23, 2024 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Convergence. Imagine if you had a digital
Educational Byte: Are Privacy Coins Like Monero and Zcash Legal?
Saturday, November 23, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 23, 2024? The HackerNoon
🐍 New Python tutorials on Real Python
Saturday, November 23, 2024
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Black Friday Giveaway @ Real Python This Black
Re: Hackers may have stolen everyone's SSN!
Saturday, November 23, 2024
I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for