Dr. Joseph Gonzalez, UC Berkeley: Creating Gorilla and Language Models that Can Call APIs
Was this email forwarded to you? Sign up here Dr. Joseph Gonzalez, UC Berkeley: Creating Gorilla and Language Models that Can Call APIsThe team behind Gorilla discusses the process of creating the model and the state of API-augmented language models.This interview includes answers from Dr. Gonzalez’s students Shishir G. Patil*, and Tianjun Zhang who participated in the creation of the Gorilla model. Quick bio
I am a Professor in the EECS department at UC Berkeley, a co-director and founding member of the UC Berkeley Sky Computing Lab, and RISE Lab and a member of the Berkeley AI Research (BAIR Group). I work on research spanning machine learning and systems. I am co-founder at Aqueduct and prior to joining Berkeley, I co-founded Turi Inc. (acquired by Apple), which was based on my thesis work. 🛠 ML Work
We started the Gorilla project with the goal of radically extending the tool capabilities of LLMs. However, as the project evolved, we realized we were working on something even bigger. The vision of Gorilla is to provide an LLM interface to the world. Today, we rely on a web browser to discover and use services. Tomorrow, AI technology will extend and maybe even replace the browser as our interface to the world. Through conversations with persistent context, LLMs will discover the right services and take the correct actions to help us both complete tasks and even understand the scope of what we can accomplish. For example, next week I am traveling to give a talk and Gorilla could examine my schedule, remind me this week, and notice that I still haven’t booked a rental car. It could find a discount EV rental service based on my preferences and perhaps even plan a roadtrip over the weekend.
Shishir: Gorilla is an LLM that is trained to be able to write API calls accurately. We are able to do this due to an innovative training recipe that we call RAT - Retriever Aware Training. In RAT, we train the LLM to be aware that the prompt is in-fact augmented by a retriever, which allows us to treat the retrieved data differently from just the user prompt. What makes API calls unique is that APIs are extremely brittle, even a single spelling error can lead to an error. Hence, it is a significantly more challenging task than just text/code/image generation.
Shishir: Well, we don’t know for sure, given the other models are closed-source :) But our best guess is that a few factors helped. First, like I mentioned, I think our RAT (retriever aware training) really shines when it comes to writing APIs. Second, introducing the ability to measure hallucination, something we can do with APIs, gave us a standing to actually compare and refine techniques. Third, the ability to make the LLM focus on writing just API calls definitely helped.
Tianjun: We used a technique called Self-Instruct, a simple idea where you can use the LLM itself to generate questions and their corresponding answers. We know LLMs today are really good at coming up with answers/solutions, and from the paper, it also seemed to be good at generating questions. It turns out by showing a few examples of question-answer pairs, the model is already good at coming up with instructions. For better quality, we also manually edit the questions and answers to make them more robust.
Tianjun: The ToolFormer is a great demonstration of tool use in LLM, but it only demonstrates the capability in a very narrow domain: using tools like calculator, wikipedia, etc., to answer a specific question. It also focuses on ~20 API calls, which is far less than Gorilla (dealing with 3000+ API calls). We also build an extensive evaluation benchmark on these calls, rather than focusing on question answering. From our point of view, the solutions proposed by Gorilla and Toolformer are different mainly because they look at the problem from a different scale and perspective. 💥 Miscellaneous – a set of rapid-fire questions
I am still very excited about the work being done in computer vision and multi-modal models as well as a lot of the more basic work connecting machine learning to various data sources (e.g., Feature Stores and Vector Stores). We are also looking at how to better serve models by playing with different tradeoffs in latency and throughput.
This is an important question and I don’t yet know where things will head. When some of the first set of open source LLMs for chatting came about, a lot of people started to think that the open-source LLMs would dominate the big commercial LLM providers. This hasn’t happened. For general reasoning, it is challenging to beat state-of-the-art commercial offerings. I think in the future we will see lots of open-source specialized models that perform certain tasks well (or at least good enough). Yet, just like web search, I still imagine there will be a few hosted LLMs that people will use everyday. This is because building, maintaining, and delivering LLM technology requires significant capital investments in people, data, and technology.
There is a major open question about how we balance retrieval augmented generation with fine-tuning to incorporate domain knowledge. There are strengths and weaknesses to both approaches. Retrieval is limited by the quality of retrieved results as well as our LLMs ability to deal with distracting content. Fine-tuning has the challenge of potentially requiring many models and it is not yet clear how much to fine-tune or the consequences of fine-tuning on the underlying model abilities.
I suspect this will quickly become a dominant focus of LLM research and commercial applications of LLMs. Being able to use tools and web services will make LLM technology significantly more powerful and useful. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Understanding ReAct(Reason + Act) in LLMs
Tuesday, July 25, 2023
Language models that can both reason and make decisions.
The Llama 2 Effect
Sunday, July 23, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
How OpenAI Uses GPT-4 to Interpret the Functions of Neurons in Other Language Models
Thursday, July 20, 2023
A new interpretability method based on GPT-4 can derive explanations about specific neurons in LLMs.
Luca Beurer-Kellner: ETH Zürich, Creator, Language Model Query Language,
Wednesday, July 19, 2023
LMQL, language model programming and the future of LLMs.
Edge 309: What is Active Prompting?
Tuesday, July 18, 2023
Understanding one of the most effective techniques to improve the effectiveness of prompts in LLM applications.
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your