The Sequence Chat: Lianmin Zheng, UC Berkeley About Vicuna, Chatbot Arena and the Open Source LLM Revolution
Was this email forwarded to you? Sign up here The Sequence Chat: Lianmin Zheng, UC Berkeley About Vicuna, Chatbot Arena and the Open Source LLM RevolutionThe co-creator of one of the most important open source LLMs shares his insights about research and development in foundation models.Lianmin Zheng is a Ph.D. student in the EECS department at UC Berkeley, advised by Ion Stoica and Joseph E. Gonzalez. His research interests include LLMs, compilers, and distributed systems. He was awarded the Meta PhD Fellowship. Currently, he is leading the LMSYS efforts and open-source projects including Vicuna and Chatbot Arena. Quick bio
I am a Ph.D. student working on the intersection of AI and systems. I am committed to open-source AI research by developing better models (e.g., Vicuna), evaluations (e.g., Chatbot Arena and MT-bench), and systems (e.g., FastChat, Alpa). I get started in AI from my undergrad research projects. 🛠 AI Work
The vision of Vicuna project is to build powerful models similar to OpenAI’s ChatGPT but with an open recipe. The rapid advancement of large language models (LLMs) has revolutionized AI systems, resulting in unprecedented levels of intelligence as seen in OpenAI's ChatGPT. However, despite its impressive performance, the training and architecture details of ChatGPT remain unclear, hindering research and open-source innovation in this field. So, we started Vicuna project to replicate ChatGPT-like capability with open recipe. This project is inspired by Llama and Alpaca. We emphasize the importance of data quality, so we find the best data source – user shared conversations on ShareGPT.
We used standard instruction fine-tuning and additionally handles multi-turn conversations. We carefully cleaned the collected conversations and only compute loss on the assistant outputs. This makes Vicuna better at multi-turn conversations. In the latest versions of Vicuna, we also extend the context length to 16k with RoPE interpolation. All our code and hyperparameters are available at https://github.com/lm-sys/FastChat.
To scale the training to larger models, you need more GPUs and better parallelism strategies. Finetuning a 33B is actually not that challenging for latest GPUs like H100 (80 GB) or A100 (80 GB), so we just use our existing code in FastChat, which utilizes Pytorch FSDP for parallelism. If you want to efficiently scale to a larger scale with more advanced parallelism strategies, you can check out Megatron-LM, DeepSpeed or our research project Alpa.
Possible topics:
Limitations:
We think we should evaluate LLMs on more open-ended and fresh questions, instead of multi-choice questions like MMLU, so we started Chatbot Arena and MT-bench. Chatbot arena is a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner. So far, we have collected around 70K votes and used these votes to compute Elo ratings of models. You can check out the latest leaderboard. It is based on human preferences. MT-bench is a small set of challenging multi-turn questions where you can use them in a more controlled and automated manner. The details can be found in our paper Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. It is based on GPT-4 grading. I think it is very challenging to build a robust evaluation. Here are some suggestions:
Yes. We are working on enhancing the reasoning and coding ability of Vicuna. Stay tuned! 💥 Miscellaneous – a set of rapid-fire questions
Compiler. Besides generative AI, I worked on several compiler projects such as Alpa (based on Jax/XLA) and Ansor (based on TVM).
Get competitive performance in coding/algorithm competitions such as International Olympiad in Informatics (IOI) and The International Collegiate Programming Contest (ICPC), without seeing the problems in their training data.
The latest vicuna is finetuned from Llama-2. It focuses on chat ability and helpfulness. Compared to base models (e.g., Llama 2, Falcon), it has the instruction-following ability. Compared to other finetunes, the training data (ShareGPT) of Vicuna makes it able to handle chat on a diverse range of topics.
I think that they will continue to coexist, similarly to how we currently distribute software.You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Meet SDXL 1.0: Stability AI New Text-to-Image Super Model
Sunday, September 17, 2023
The model represents a major improvement over Stable Diffusion.
🎥 Building, Training & Deploying High-Quality ML Models—a Virtual Hands-On Lab
Sunday, September 17, 2023
Want to get some hands-on training to learn how to generate accurate training datasets for ML models, implement feature pipelines, and better manage the lifecycle of ML models and features? Join us at
NVIDIA, The Most Influential VC In Generative AI
Sunday, September 17, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
Edge 325: A Summary of Our Series About New Techniques Foundation Models
Tuesday, September 12, 2023
18 topics about foundation models covered in one of our most ambitious series.
Falcon-180B Takes Open Source LLMs Closer to GPT-4
Sunday, September 10, 2023
Next Week in The Sequence: Edge 325: We conclude our longest and most sucessful series about new techniques in foundation models with a comprehensive summary. I can't wait to tell you about our
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your