The Sequence Chat: Sharon Zhou: CEO, Lamini on RLHF and Fine-Tuning LLMs
Was this email forwarded to you? Sign up here The Sequence Chat: Sharon Zhou: CEO, Lamini on RLHF and Fine-Tuning LLMsLamini has a front row seats to some of the challenges of fine-tuning and building instruction following LLMs.👤 Quick bio
I had an unconventional childhood, with a lot of spontaneity, adventure, the outdoors–and notably, not a lot of technology. I studied ancient history, because I liked learning about what stood the test of time and what has stood millennia from the Ancient Greeks, Romans, and Chinese. I’ve always wanted to contribute something to society that could leave such a legacy. I found technology by way of user experience and especially one of its golden rules: it’s never the user’s fault. This tenet has compelled me to design technology to make those who hadn’t grown up with technology, like myself, less intimidated by it. Naturally, I applied this principle as a product manager, first at ML/data startups, then at Google. At Google, I learned that Google was the type of company and legacy I wanted to build: it has been technologically transformative, dramatically changing the way we interact with technology and information. Modeling after Larry and Sergey, the founders of Google, I set out to drop out of Stanford’s PhD program. The joke was that I “failed to drop out” and graduated, because I fell in love with deep generative models—or now more popularly known as generative AI. After graduating, I taught and led a research group at Stanford as computer science faculty in generative AI, and also created and taught a Coursera course on generative AI to over 80K+ students, as one of the largest courses in the world. Now, I’ve started a company, Lamini, that combines my two loves: user experience and generative AI. Lamini is building a large language model (LLM) engine that makes it easy and fast for engineers without a top AI PhD to train these magical models, using finetuning, RLHF, and more (we’re barely scratching the surface :)). My dream is to see far more LLMs out there in the world, covering the space of possible applications. I believe LLMs should be built for everyone, so as a result, building them needs to be accessible and usable by everyone. 🛠 ML Work
Lamini’s mission is to give every developer the superpowers that took the world from GPT-3 to ChatGPT! Quotes from our customers at Fortune 500 companies have really motivated us to build this. “Our team of 10 machine learning engineers hit the OpenAI fine-tuning API, but our model got worse — help!” Or even: “I don’t know how to make the best use of my data — I’ve exhausted all the prompt magic we can summon from tutorials online.” Lamini is an LLM engine that allows any developer, not just machine learning experts, to train high-performing LLMs, as good as ChatGPT, on large datasets with just a few lines of code from the Lamini library. Here’s our launch blogpost for more info!
Probably running PPO and finetuning the LM policy. There are a lot of optimizations that can be done to make this better, and it’s still unclear whether it increases or decreases hallucinations. But first, a bit on RLHF: it’s trying to unlock more from your data and align the model with how you want it to behave when you interact with it. Basing this a bit on Yoav Goldberg’s note, RLHF works because it provides richer signals by way of (1) having more grades to the model’s outputs (instead of just right/wrong, you have varying levels of correctness in the form of rankings) and (2) providing negative signal back into the model (instead of just the absence of bad). What’s challenging about it is using another model, trained on additional data, to provide that richer signal back into your original transformer. For many non-transformer models, this would be easy. For transformers, this isn’t super.
Data is a key component to building high-performing LLMs. Data is useful when it’s high quality, large, and covering your entire use case. Usually, that’s not the case out of the gate, even if you’re Walmart. So that’s where data generation, even if a bit noisy, comes into play to boost your LLM’s performance further. That is to say, data generation is useful beyond low data scenarios.
Better tools like Lamini. We can make these models highly effective for different use cases. That said, there will always be a research frontier, but research has a >50% failure rate, so we incorporate only the effective research in. I will say, however, the knowledge to build this toolset is stuck in a very small handful of people, often with top AI PhDs, who have not only built these systems in research, but also deployed them to real use cases in production to billions of people. At Lamini, we have those people. OpenAI does too. But very few places have these people. And fewer can attract any of them to join, let alone tens or hundreds of them needed to build these models and systems.
I’ll admit it here. I’m part LeCunie. I think there’s non-autoregressive models that will take the reins at some point, especially if it means being able to provide richer gradients/signals back into the model than what the vanilla transformer does now. Whether that’s actually diffusion models is still an open research question. 💥 Miscellaneous – a set of rapid-fire questions
For a while, I had been publishing and tracking applied AI to healthcare and climate change. It’s exciting to see what we can do there. I like the emphasis on being creative with data, which matches real world use cases far more than going after standard research benchmarks and datasets.
HuggingFace and OpenAI, if those count as frameworks. I’m impressed with both teams. I’m their target user and they sure target me well. They solve a clear need that I have: easy, flexible access to the latest and greatest foundation models.
They coexist, in my opinion. Why wouldn’t they? They serve different needs, even within the same user. I sometimes need an API, sometimes need the base open source model, sometimes need a no-code interface. And I’m someone who can consume all three. Some people can only consume a subset of these.
I’m interested to see a real contender with an architecture that can take in richer information through gradients, while keeping the amazing efficiency of self-attention. This might start with diffusion models, at least working in tandem with transformers to provide more informed gradients beyond just next token prediction. It’ll likely not be the instantiation of diffusion models that we know today, but it’ll be interesting to see how to provide more information via sgd than we are doing with transformers today (I know there’s some early work today, but as of writing this, it hasn’t been reproduced really well. We as a community move like lightning though, so I don’t doubt we’ll make leaps and strides soon). You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Key phrases
Older messages
Edge 291: Reinforcement Learning with Human Feedback
Tuesday, May 16, 2023
1) Reinforcement Learning with Human Feedback(RLHF) 2) The RLHF paper, 3) The transformer reinforcement learning framework.
Google’s Somewhat “Moat-less “ AI Week
Sunday, May 14, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
💥 Win a Lambda GPU workstation with your AI paper submission!
Friday, May 12, 2023
Share your research at the world's largest virtual conference on data-centric AI
The Sequence Chat: Deyao Zhu and Ju Chen on MiniGPT-4
Friday, May 12, 2023
The researchers behind the open source GPT-4 alternative share their insights about the state of multimodal AI agents.
Edge 290: Inside Koala, Berkeley University’s LLaMA-Based Model Fine-Tuned with ChatGPT Dialogues
Friday, May 12, 2023
The model provides a lighter, open-source alternative to ChatGPT and includes EasyLM, a framework for training and fine-tuning LLMs.
You Might Also Like
Airbnb Icons 🏠, Microsoft's OpenAI email leaks 🤖, software friction 👨💻
Thursday, May 2, 2024
Airbnb's Icons is a new collection of experiences hosted by big names in music, film, television, arts, sports, and more Sign Up |Advertise|View Online TLDR Together With Dollar Flight Club TLDR
📧 Did you want this discount?
Thursday, May 2, 2024
Your chance to save on MMA is about to end.
Scoop: Tiger Global-backed Innovaccer in talks to raise $250M
Wednesday, May 1, 2024
Plus: An update on Google's layoffs and the social platform X didn't see coming View this email online in your browser By Christine Hall Wednesday, May 1, 2024 Welcome to TechCrunch PM. Today,
🖥️ Why I'm Never Going Back to a Windows PC — Tips Before You Buy a Smart Ring
Wednesday, May 1, 2024
Also: How to Clear the Moisture Detected Warning on Samsung Phones, and More How-To Geek Logo May 1, 2024 Did You Know A single 1 oz shot of espresso only has approximately 40 mg of caffeine, whereas a
Daily Coding Problem: Problem #1428 [Hard]
Wednesday, May 1, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Given an array of positive integers, divide the array into two subsets such
Top Tech Deals 👀 Samsung Gaming Monitor, Pixel Watch 2, MacBook Air, and More
Wednesday, May 1, 2024
Get a discounted M3 MacBook Air or expand your Xbox storage. How-To Geek Logo May 1, 2024 Top Tech Deals: Samsung Gaming Monitor, Pixel Watch 2, MacBook Air, and More Get a discounted M3 MacBook Air or
Infographic | Visualizing Global Gold Production in 2023 🏅
Wednesday, May 1, 2024
Gold production in 2023 was led by China, Australia, and Russia, with each outputting over 300 tonnes. View Online | Subscribe Presented by: Access European benchmarks with a trusted 25-year history
⚙️ GPT-5 may be releasing sooner than expected
Wednesday, May 1, 2024
Plus: Amazon rebrands AI branch
Noonification: How to Create a CI/CD Pipeline Using GitHub and AWS EC2
Wednesday, May 1, 2024
Top Tech Content sent at Noon! Get Algolia: AI Search that understands How are you, @newsletterest1? 🪐 What's happening in tech today, May 1, 2024? The HackerNoon Newsletter brings the HackerNoon
Arc for Windows is better than Chrome
Wednesday, May 1, 2024
Adobe bug bounty; Rabbit's first R1 software update; Dream podcaster mic -- ZDNET ZDNET Tech Today - US May 1, 2024 placeholder Arc browser is now available for Windows and it's so much better