The Sequence Chat: Sharon Zhou: CEO, Lamini on RLHF and Fine-Tuning LLMs
Was this email forwarded to you? Sign up here The Sequence Chat: Sharon Zhou: CEO, Lamini on RLHF and Fine-Tuning LLMsLamini has a front row seats to some of the challenges of fine-tuning and building instruction following LLMs.👤 Quick bio
I had an unconventional childhood, with a lot of spontaneity, adventure, the outdoors–and notably, not a lot of technology. I studied ancient history, because I liked learning about what stood the test of time and what has stood millennia from the Ancient Greeks, Romans, and Chinese. I’ve always wanted to contribute something to society that could leave such a legacy. I found technology by way of user experience and especially one of its golden rules: it’s never the user’s fault. This tenet has compelled me to design technology to make those who hadn’t grown up with technology, like myself, less intimidated by it. Naturally, I applied this principle as a product manager, first at ML/data startups, then at Google. At Google, I learned that Google was the type of company and legacy I wanted to build: it has been technologically transformative, dramatically changing the way we interact with technology and information. Modeling after Larry and Sergey, the founders of Google, I set out to drop out of Stanford’s PhD program. The joke was that I “failed to drop out” and graduated, because I fell in love with deep generative models—or now more popularly known as generative AI. After graduating, I taught and led a research group at Stanford as computer science faculty in generative AI, and also created and taught a Coursera course on generative AI to over 80K+ students, as one of the largest courses in the world. Now, I’ve started a company, Lamini, that combines my two loves: user experience and generative AI. Lamini is building a large language model (LLM) engine that makes it easy and fast for engineers without a top AI PhD to train these magical models, using finetuning, RLHF, and more (we’re barely scratching the surface :)). My dream is to see far more LLMs out there in the world, covering the space of possible applications. I believe LLMs should be built for everyone, so as a result, building them needs to be accessible and usable by everyone. 🛠 ML Work
Lamini’s mission is to give every developer the superpowers that took the world from GPT-3 to ChatGPT! Quotes from our customers at Fortune 500 companies have really motivated us to build this. “Our team of 10 machine learning engineers hit the OpenAI fine-tuning API, but our model got worse — help!” Or even: “I don’t know how to make the best use of my data — I’ve exhausted all the prompt magic we can summon from tutorials online.” Lamini is an LLM engine that allows any developer, not just machine learning experts, to train high-performing LLMs, as good as ChatGPT, on large datasets with just a few lines of code from the Lamini library. Here’s our launch blogpost for more info!
Probably running PPO and finetuning the LM policy. There are a lot of optimizations that can be done to make this better, and it’s still unclear whether it increases or decreases hallucinations. But first, a bit on RLHF: it’s trying to unlock more from your data and align the model with how you want it to behave when you interact with it. Basing this a bit on Yoav Goldberg’s note, RLHF works because it provides richer signals by way of (1) having more grades to the model’s outputs (instead of just right/wrong, you have varying levels of correctness in the form of rankings) and (2) providing negative signal back into the model (instead of just the absence of bad). What’s challenging about it is using another model, trained on additional data, to provide that richer signal back into your original transformer. For many non-transformer models, this would be easy. For transformers, this isn’t super.
Data is a key component to building high-performing LLMs. Data is useful when it’s high quality, large, and covering your entire use case. Usually, that’s not the case out of the gate, even if you’re Walmart. So that’s where data generation, even if a bit noisy, comes into play to boost your LLM’s performance further. That is to say, data generation is useful beyond low data scenarios.
Better tools like Lamini. We can make these models highly effective for different use cases. That said, there will always be a research frontier, but research has a >50% failure rate, so we incorporate only the effective research in. I will say, however, the knowledge to build this toolset is stuck in a very small handful of people, often with top AI PhDs, who have not only built these systems in research, but also deployed them to real use cases in production to billions of people. At Lamini, we have those people. OpenAI does too. But very few places have these people. And fewer can attract any of them to join, let alone tens or hundreds of them needed to build these models and systems.
I’ll admit it here. I’m part LeCunie. I think there’s non-autoregressive models that will take the reins at some point, especially if it means being able to provide richer gradients/signals back into the model than what the vanilla transformer does now. Whether that’s actually diffusion models is still an open research question. 💥 Miscellaneous – a set of rapid-fire questions
For a while, I had been publishing and tracking applied AI to healthcare and climate change. It’s exciting to see what we can do there. I like the emphasis on being creative with data, which matches real world use cases far more than going after standard research benchmarks and datasets.
HuggingFace and OpenAI, if those count as frameworks. I’m impressed with both teams. I’m their target user and they sure target me well. They solve a clear need that I have: easy, flexible access to the latest and greatest foundation models.
They coexist, in my opinion. Why wouldn’t they? They serve different needs, even within the same user. I sometimes need an API, sometimes need the base open source model, sometimes need a no-code interface. And I’m someone who can consume all three. Some people can only consume a subset of these.
I’m interested to see a real contender with an architecture that can take in richer information through gradients, while keeping the amazing efficiency of self-attention. This might start with diffusion models, at least working in tandem with transformers to provide more informed gradients beyond just next token prediction. It’ll likely not be the instantiation of diffusion models that we know today, but it’ll be interesting to see how to provide more information via sgd than we are doing with transformers today (I know there’s some early work today, but as of writing this, it hasn’t been reproduced really well. We as a community move like lightning though, so I don’t doubt we’ll make leaps and strides soon). You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 291: Reinforcement Learning with Human Feedback
Tuesday, May 16, 2023
1) Reinforcement Learning with Human Feedback(RLHF) 2) The RLHF paper, 3) The transformer reinforcement learning framework.
Google’s Somewhat “Moat-less “ AI Week
Sunday, May 14, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
💥 Win a Lambda GPU workstation with your AI paper submission!
Friday, May 12, 2023
Share your research at the world's largest virtual conference on data-centric AI
The Sequence Chat: Deyao Zhu and Ju Chen on MiniGPT-4
Friday, May 12, 2023
The researchers behind the open source GPT-4 alternative share their insights about the state of multimodal AI agents.
Edge 290: Inside Koala, Berkeley University’s LLaMA-Based Model Fine-Tuned with ChatGPT Dialogues
Friday, May 12, 2023
The model provides a lighter, open-source alternative to ChatGPT and includes EasyLM, a framework for training and fine-tuning LLMs.
You Might Also Like
PHP 8.4 is released, Dynamic Mailer Configuration, and more! - №540
Sunday, November 24, 2024
Your Laravel week in review ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Lumoz RaaS Introduces Layer 2 Solution on Move Ecosystem
Sunday, November 24, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 24, 2024? The HackerNoon
😼 The hottest new AI engineer
Sunday, November 24, 2024
Plus, an uncheatable tech screen app Product Hunt Sunday, Nov 24 The Roundup This newsletter was brought to you by Countly Happy Sunday! Welcome back to another edition of The Roundup, folks. We've
Transformers are Eating Quantum
Sunday, November 24, 2024
DeepMind's AlphaQubit addresses one of the main challenges in quantum computing. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Retro Recomendo: Gift Ideas
Sunday, November 24, 2024
Recomendo - issue #438 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Kotlin Weekly #434
Sunday, November 24, 2024
ISSUE #434 24th of November 2024 Hi Kotliners! Next week is the last one to send a paper proposal for the KotlinConf. We hope to see you there next year. Announcements State of Kotlin Scripting 2024
Weekend Reading — More time to write
Sunday, November 24, 2024
More Time to Write A fully functional clock that ticks backwards, giving you more time to write. Tech Stuff Martijn Faassen (FWIW I don't know how to use any debugger other than console.log) People
🕹️ Retro Consoles Worth Collecting While You Still Can — Is Last Year's Flagship Phone Worth Your Money?
Saturday, November 23, 2024
Also: Best Outdoor Smart Plugs, and More! How-To Geek Logo November 23, 2024 Did You Know After the "flair" that servers wore—buttons and other adornments—was made the butt of a joke in the
JSK Daily for Nov 23, 2024
Saturday, November 23, 2024
JSK Daily for Nov 23, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Not Ready For The Camera 📸
Saturday, November 23, 2024
What (and who) video-based social media leaves out. Here's a version for your browser. Hunting for the end of the long tail • November 23, 2024 Not Ready For The Camera Why hasn't video