The Sequence Chat: Sharon Zhou: CEO, Lamini on RLHF and Fine-Tuning LLMs
Was this email forwarded to you? Sign up here The Sequence Chat: Sharon Zhou: CEO, Lamini on RLHF and Fine-Tuning LLMsLamini has a front row seats to some of the challenges of fine-tuning and building instruction following LLMs.👤 Quick bio
I had an unconventional childhood, with a lot of spontaneity, adventure, the outdoors–and notably, not a lot of technology. I studied ancient history, because I liked learning about what stood the test of time and what has stood millennia from the Ancient Greeks, Romans, and Chinese. I’ve always wanted to contribute something to society that could leave such a legacy. I found technology by way of user experience and especially one of its golden rules: it’s never the user’s fault. This tenet has compelled me to design technology to make those who hadn’t grown up with technology, like myself, less intimidated by it. Naturally, I applied this principle as a product manager, first at ML/data startups, then at Google. At Google, I learned that Google was the type of company and legacy I wanted to build: it has been technologically transformative, dramatically changing the way we interact with technology and information. Modeling after Larry and Sergey, the founders of Google, I set out to drop out of Stanford’s PhD program. The joke was that I “failed to drop out” and graduated, because I fell in love with deep generative models—or now more popularly known as generative AI. After graduating, I taught and led a research group at Stanford as computer science faculty in generative AI, and also created and taught a Coursera course on generative AI to over 80K+ students, as one of the largest courses in the world. Now, I’ve started a company, Lamini, that combines my two loves: user experience and generative AI. Lamini is building a large language model (LLM) engine that makes it easy and fast for engineers without a top AI PhD to train these magical models, using finetuning, RLHF, and more (we’re barely scratching the surface :)). My dream is to see far more LLMs out there in the world, covering the space of possible applications. I believe LLMs should be built for everyone, so as a result, building them needs to be accessible and usable by everyone. 🛠 ML Work
Lamini’s mission is to give every developer the superpowers that took the world from GPT-3 to ChatGPT! Quotes from our customers at Fortune 500 companies have really motivated us to build this. “Our team of 10 machine learning engineers hit the OpenAI fine-tuning API, but our model got worse — help!” Or even: “I don’t know how to make the best use of my data — I’ve exhausted all the prompt magic we can summon from tutorials online.” Lamini is an LLM engine that allows any developer, not just machine learning experts, to train high-performing LLMs, as good as ChatGPT, on large datasets with just a few lines of code from the Lamini library. Here’s our launch blogpost for more info!
Probably running PPO and finetuning the LM policy. There are a lot of optimizations that can be done to make this better, and it’s still unclear whether it increases or decreases hallucinations. But first, a bit on RLHF: it’s trying to unlock more from your data and align the model with how you want it to behave when you interact with it. Basing this a bit on Yoav Goldberg’s note, RLHF works because it provides richer signals by way of (1) having more grades to the model’s outputs (instead of just right/wrong, you have varying levels of correctness in the form of rankings) and (2) providing negative signal back into the model (instead of just the absence of bad). What’s challenging about it is using another model, trained on additional data, to provide that richer signal back into your original transformer. For many non-transformer models, this would be easy. For transformers, this isn’t super.
Data is a key component to building high-performing LLMs. Data is useful when it’s high quality, large, and covering your entire use case. Usually, that’s not the case out of the gate, even if you’re Walmart. So that’s where data generation, even if a bit noisy, comes into play to boost your LLM’s performance further. That is to say, data generation is useful beyond low data scenarios.
Better tools like Lamini. We can make these models highly effective for different use cases. That said, there will always be a research frontier, but research has a >50% failure rate, so we incorporate only the effective research in. I will say, however, the knowledge to build this toolset is stuck in a very small handful of people, often with top AI PhDs, who have not only built these systems in research, but also deployed them to real use cases in production to billions of people. At Lamini, we have those people. OpenAI does too. But very few places have these people. And fewer can attract any of them to join, let alone tens or hundreds of them needed to build these models and systems.
I’ll admit it here. I’m part LeCunie. I think there’s non-autoregressive models that will take the reins at some point, especially if it means being able to provide richer gradients/signals back into the model than what the vanilla transformer does now. Whether that’s actually diffusion models is still an open research question. 💥 Miscellaneous – a set of rapid-fire questions
For a while, I had been publishing and tracking applied AI to healthcare and climate change. It’s exciting to see what we can do there. I like the emphasis on being creative with data, which matches real world use cases far more than going after standard research benchmarks and datasets.
HuggingFace and OpenAI, if those count as frameworks. I’m impressed with both teams. I’m their target user and they sure target me well. They solve a clear need that I have: easy, flexible access to the latest and greatest foundation models.
They coexist, in my opinion. Why wouldn’t they? They serve different needs, even within the same user. I sometimes need an API, sometimes need the base open source model, sometimes need a no-code interface. And I’m someone who can consume all three. Some people can only consume a subset of these.
I’m interested to see a real contender with an architecture that can take in richer information through gradients, while keeping the amazing efficiency of self-attention. This might start with diffusion models, at least working in tandem with transformers to provide more informed gradients beyond just next token prediction. It’ll likely not be the instantiation of diffusion models that we know today, but it’ll be interesting to see how to provide more information via sgd than we are doing with transformers today (I know there’s some early work today, but as of writing this, it hasn’t been reproduced really well. We as a community move like lightning though, so I don’t doubt we’ll make leaps and strides soon). You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 291: Reinforcement Learning with Human Feedback
Tuesday, May 16, 2023
1) Reinforcement Learning with Human Feedback(RLHF) 2) The RLHF paper, 3) The transformer reinforcement learning framework.
Google’s Somewhat “Moat-less “ AI Week
Sunday, May 14, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
💥 Win a Lambda GPU workstation with your AI paper submission!
Friday, May 12, 2023
Share your research at the world's largest virtual conference on data-centric AI
The Sequence Chat: Deyao Zhu and Ju Chen on MiniGPT-4
Friday, May 12, 2023
The researchers behind the open source GPT-4 alternative share their insights about the state of multimodal AI agents.
Edge 290: Inside Koala, Berkeley University’s LLaMA-Based Model Fine-Tuned with ChatGPT Dialogues
Friday, May 12, 2023
The model provides a lighter, open-source alternative to ChatGPT and includes EasyLM, a framework for training and fine-tuning LLMs.
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your