Was this email forwarded to you? Sign up here

The Sequence Chat: Sharon Zhou: CEO, Lamini on RLHF and Fine-Tuning LLMs

Lamini has a front row seats to some of the challenges of fine-tuning and building instruction following LLMs.

May 17

Share

👤 Quick bio

Tell us a bit about yourself. Your background, current role and how did you get started in machine learning?

I had an unconventional childhood, with a lot of spontaneity, adventure, the outdoors–and notably, not a lot of technology. I studied ancient history, because I liked learning about what stood the test of time and what has stood millennia from the Ancient Greeks, Romans, and Chinese. I’ve always wanted to contribute something to society that could leave such a legacy. I found technology by way of user experience and especially one of its golden rules: it’s never the user’s fault. This tenet has compelled me to design technology to make those who hadn’t grown up with technology, like myself, less intimidated by it.

Naturally, I applied this principle as a product manager, first at ML/data startups, then at Google. At Google, I learned that Google was the type of company and legacy I wanted to build: it has been technologically transformative, dramatically changing the way we interact with technology and information. Modeling after Larry and Sergey, the founders of Google, I set out to drop out of Stanford’s PhD program. The joke was that I “failed to drop out” and graduated, because I fell in love with deep generative models—or now more popularly known as generative AI. After graduating, I taught and led a research group at Stanford as computer science faculty in generative AI, and also created and taught a Coursera course on generative AI to over 80K+ students, as one of the largest courses in the world.

Now, I’ve started a company, Lamini, that combines my two loves: user experience and generative AI. Lamini is building a large language model (LLM) engine that makes it easy and fast for engineers without a top AI PhD to train these magical models, using finetuning, RLHF, and more (we’re barely scratching the surface :)). My dream is to see far more LLMs out there in the world, covering the space of possible applications. I believe LLMs should be built for everyone, so as a result, building them needs to be accessible and usable by everyone.

🛠 ML Work

Lamini is an exciting new project in the foundation model space. Could you tell us a bit about the vision. goals and roadmap of the project?

Lamini’s mission is to give every developer the superpowers that took the world from GPT-3 to ChatGPT! Quotes from our customers at Fortune 500 companies have really motivated us to build this. “Our team of 10 machine learning engineers hit the OpenAI fine-tuning API, but our model got worse — help!” Or even: “I don’t know how to make the best use of my data — I’ve exhausted all the prompt magic we can summon from tutorials online.” Lamini is an LLM engine that allows any developer, not just machine learning experts, to train high-performing LLMs, as good as ChatGPT, on large datasets with just a few lines of code from the Lamini library. Here’s our launch blogpost for more info!

RLHF has rapidly become one of the fundamental components of modern LLMs but also one capability that is often underestimated in terms of its complexity. What are some of the key challenges to apply RLHF on existing LLMs?

Probably running PPO and finetuning the LM policy. There are a lot of optimizations that can be done to make this better, and it’s still unclear whether it increases or decreases hallucinations.

But first, a bit on RLHF: it’s trying to unlock more from your data and align the model with how you want it to behave when you interact with it. Basing this a bit on Yoav Goldberg’s note, RLHF works because it provides richer signals by way of (1) having more grades to the model’s outputs (instead of just right/wrong, you have varying levels of correctness in the form of rankings) and (2) providing negative signal back into the model (instead of just the absence of bad).

What’s challenging about it is using another model, trained on additional data, to provide that richer signal back into your original transformer. For many non-transformer models, this would be easy. For transformers, this isn’t super.

One of the key components of the Lamini platform that I found particularly interesting was the data generator. Why is the data generator needed and what is its role in the fine-tuning process?

Data is a key component to building high-performing LLMs. Data is useful when it’s high quality, large, and covering your entire use case. Usually, that’s not the case out of the gate, even if you’re Walmart. So that’s where data generation, even if a bit noisy, comes into play to boost your LLM’s performance further. That is to say, data generation is useful beyond low data scenarios.

Let’s talk about fine-tuning. What are some of the practical limitations of fine-tuning methods? Do we need new research breakthroughs in the space or is it a matter of building better frameworks and tools like Lamini?

Better tools like Lamini. We can make these models highly effective for different use cases. That said, there will always be a research frontier, but research has a >50% failure rate, so we incorporate only the effective research in. I will say, however, the knowledge to build this toolset is stuck in a very small handful of people, often with top AI PhDs, who have not only built these systems in research, but also deployed them to real use cases in production to billions of people. At Lamini, we have those people. OpenAI does too. But very few places have these people. And fewer can attract any of them to join, let alone tens or hundreds of them needed to build these models and systems.

In addition to RLHF, what are some of the new areas of research in LLMs that you believe could lead to relevant improvements in the next generation of foundation models?

I’ll admit it here. I’m part LeCunie. I think there’s non-autoregressive models that will take the reins at some point, especially if it means being able to provide richer gradients/signals back into the model than what the vanilla transformer does now. Whether that’s actually diffusion models is still an open research question.

💥 Miscellaneous – a set of rapid-fire questions

What is your favorite area of AI research outside of generative AI?

For a while, I had been publishing and tracking applied AI to healthcare and climate change. It’s exciting to see what we can do there. I like the emphasis on being creative with data, which matches real world use cases far more than going after standard research benchmarks and datasets.

Your two favorite LLM frameworks outside Lamini and why?

HuggingFace and OpenAI, if those count as frameworks. I’m impressed with both teams. I’m their target user and they sure target me well. They solve a clear need that I have: easy, flexible access to the latest and greatest foundation models.

How do you see the balance and friction between open source vs. API-based distribution models for LLMs? Who wins at the end?

They coexist, in my opinion. Why wouldn’t they? They serve different needs, even within the same user. I sometimes need an API, sometimes need the base open source model, sometimes need a no-code interface. And I’m someone who can consume all three. Some people can only consume a subset of these.

If you need to predict one technique that will overtake transformers as the architecture of choice for foundation models, what would that be?

I’m interested to see a real contender with an architecture that can take in richer information through gradients, while keeping the amazing efficiency of self-attention. This might start with diffusion models, at least working in tandem with transformers to provide more informed gradients beyond just next token prediction. It’ll likely not be the instantiation of diffusion models that we know today, but it’ll be interesting to see how to provide more information via sgd than we are doing with transformers today (I know there’s some early work today, but as of writing this, it hasn’t been reproduced really well. We as a community move like lightning though, so I don’t doubt we’ll make leaps and strides soon).

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Like

Comment

Restack

The Sequence Chat: Sharon Zhou: CEO, Lamini on RLHF and Fine-Tuning LLMs

The Sequence Chat: Sharon Zhou: CEO, Lamini on RLHF and Fine-Tuning LLMs

Lamini has a front row seats to some of the challenges of fine-tuning and building instruction following LLMs.

👤 Quick bio

🛠 ML Work

💥 Miscellaneous – a set of rapid-fire questions

Older messages

Edge 291: Reinforcement Learning with Human Feedback

Google’s Somewhat “Moat-less “ AI Week

💥 Win a Lambda GPU workstation with your AI paper submission!

The Sequence Chat: Deyao Zhu and Ju Chen on MiniGPT-4

Edge 290: Inside Koala, Berkeley University’s LLaMA-Based Model Fine-Tuned with ChatGPT Dialogues

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR