March 23, 2025 | Read Online

🎙️🧩 TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI

How they do a surgery on a model, rethink AI education and unlock the real magic of generative AI

That has been an experiment! That’s why it took so long to send this post out to you.

Recently, I was at the HumanX conference and ended up having a few really interesting interviews. Starting a video interview series wasn’t the plan, but the speakers were so good – I couldn’t resist. So now I’ve got a bunch of recordings (+ more interviews in the pipeline!) and thought: what can I do – as a time-strapped mother of five – with AI tools? Can I actually produce a couple of episodes on my own, using the tech I write about?

Follow us on Hugging Face 🤗

My main tool was CapCut for video editing (paid version) – lots of built-in AI features. The coolest one: no need to manual cutting out sense. Do Transcript, and then – by editing text, you will also edit the video. Works very smooth. I also used Scribe from ElevenLabs (free) (for transcription usually, I do Otter (paid)), Claude 3.7 Extended (paid) for interview editing (ChatGPT truly sucks at that), and ChatGPT 4.5 (paid) for highlights, quotes, and TLDRs.

First of all: what these tools can do now – it’s pure magic.

Quick reminder – it’s THE LAST DAY to get a massive 41% OFF for our full archive and all upcoming deep dives and business profiles 👉

UPGRADE NOW

But also – because AL tools offer so many options (do you want to keep or remove the background? How about to AI-generated background? What about voiceover? etc etc) they take time to learn. Once you figure them out, being a one-person media team feels surprisingly doable. That said, I wouldn’t necessarily recommend it. After a few nights with CapCut and the rest, I can tell you: it’s a lot. Writing an article about MCP was way easier.

Anyway – meet the spontaneously emerging podcast: Turing Post / Inference.

Where we talk to great minds to uncover insights on AI, tech, and our human future – turning deep conversations into something useful, actionable, and thought-provoking.

In the episode 001: the incredible Sharon Zhou. She’s a generative AI trailblazer, a Stanford-trained protégé of Andrew Ng – who, along with Andrej Karpathy and others, is also an investor in her company Lamini. From co-creating one of Coursera’s top AI courses to making MIT’s prestigious “35 under 35” list, Sharon turns complex tech into everyday magic. From ancient texts to modern code, she bridges Harvard and Silicon Valley – building AI that’s grounded, powerful, and made for people.

She is also super fun to talk to!

📝 Executive Summary

From surreal images to enterprise-grade LLMs: Sharon’s journey spans early generative models to founding Lamini, where teams fine-tune models for real-world accuracy.
Hallucinations are fixable: She explains how Lamini surgically edits model weights (not prompts) to reduce hallucinations – from 6-30% to 90%+ factual accuracy.
Benchmarks ≠ reality: Enterprise clients don’t care about MMLU or Spider –they need models that work on their complex data.
Agents & RAG? Overhyped (kind of): Sharon breaks down why these buzzwords resonate with non-experts, even if they puzzle researchers.
Teaching with memes, not math: Sharon makes complex AI intuitive – whether it’s for developers, policymakers or anyone.
Big idea: More people – not just researchers – should be able to steer and shape AI behavior.

Watch the video! Subscribe to our YT channel, and/or read the transcript (edited for clarity and brevity) ⬇️.

Turing Post / Inference. Ep 001: Sharon Zhou, Lamini (and so much more)

GenAI then and now

You've been doing Generative AI way before it became fancy. And you taught a huge Coursera course. How many people?

I now teach millions of people. But it started with one.

So it started with one. Then it was like hundred thousand. Now it's millions. How do you think the audience has changed from when you started?

I think the biggest thing is how commercially relevant it is today. Initially it was just a magical piece of technology. You could almost, or at least I could, see the promise of it, where it was going. Of course the outputs it was producing – not really usable. The image-based models were producing images that looked like nightmares, kind of people that would show up in horror movies. But that was considered good. Better than just random pixels.

That was really compelling at the time. And now we can generate videos that look completely realistic. I started working on general AI around 2016-17. Now it's a completely different point.

I remember thinking about GenAI in 2019, I was creating this concept of media that would use generated images, voices, anchors. No one believed me that was going to happen.

You are a visionary.

It was just amazing to see these horrifying pictures and being fascinated.

Oh yes absolutely! I feel like it weirdly turned on my maternal instinct. I was like: these are my children. Because you could see how the model improves over time. Over time you see eyes, as it trains, and then you see a full face. And then you're like – oh my gosh – I nurtured it to get to this stage. I tuned the right hyperparameters. I put in the right data to get it into the right place.

I would call them my children. And I think my PhD advisor Andrew Ng was like: that's a little weird but okay. Honestly I'd hold on my computer the screenshots of these generated faces which looked really weird at the time.

Now it's ubiquitous. People are generating things all over the place and it's magical seeing how creative people are. That's something that makes me really happy – I feel like a lot of people have also seen the magic in the technology. It's not just me, that's been just captivated by it.

Yes, it's magical. How did you move from working with images to what you do now at Lamini?

Empowering Developers & Democratization

That's a good question. Before my PhD I was a product manager but before being a PM I actually studied classics like Latin and ancient Greek literature. And I loved languages. So all of that kind of combines into language and communication for product and just story and storytelling. I love putting that all together and so I started working on language as well. I was a head of AI research nonprofit. Not Open AI! Nonprofit for aligning language models, similar to Open AI, safety for these LLMs. That was probably around 2020-2021. Then I started working on the language models more because they really took off from a commercial standpoint. And I felt like the use cases could make a very big difference in the enterprise where I have my experience in product.

A few things kind of drove me to start Lamini. One was seeing all these amazing foundation models. I thought wouldn't it be even more magical if more people in the world could steer these models. More people could define safety, more people could define where these models could grow and what these models could be capable of. They could really steer what this model behavior should be and what the model's knowledge should be which is what currently is a little bit more confined to the likes of Open AI and Anthropic etc.

There are 24 million developers out there, probably more now with GenAI – what if we can give them the keys to doing – then I think we'd be able to build better models.

I also believe that people who understand their own problems best – when given the right tools – are best suited to solve them. During my PhD, I did some projects that were applied in healthcare. We collaborated with medical school at Stanford and it was very clear to me that the doctors had such deep insight into what machine learning could actually help with. That was not super obvious to me in terms of how could this help with cancer: Oh it's predicting this bacteria that predicts cancer! I just didn't know that, but they understand that very deeply and how this could be helpful in diagnostics. I feel like for all these disciplines if we could give this magical technology as a tool to all these people – that would just enable so much more.

That was kind of the seed of starting Lamini and then Lamini grew into finding what vertical of intelligence do you wanna help people steer these models to first. I talked to 300 potential customers and came up with this idea – initially it was helping people access their structured data of text-to-SQL. As I talked to more customers, it morphed into "we need to be able to steer these models further" not just for this vertical but more broadly in this horizontal platform. So then it turned into fine-tuning these models, editing these models directly, taking it to the next step, not just prompting them.

The next stage was realizing: oh, actually we don't wanna cover everything, that's a lot for a startup to do, it also is confusing to the end customer. So let's focus on one vertical of intelligence to do that really well.

Technical Approach to Hallucination

And the biggest problem that we saw was hallucinations. We decided not just ask but really inspect it in the customer's data and understand what their objective was exactly – what was hallucinations to them. And we realized it was actually a technical problem not a philosophical one. That for many of them it was just that certain tokens needed to be a little bit more deterministic in certain context and they couldn't be made up.

I'm sure there's multiple definitions of hallucinations but that was one of them. Once we were able to frame the problem correctly, we were able to solve it technically.

Now was it easy? No. We had to do surgery to the models, we had to edit the way the models were post-trained, fine-tuned. We created a new recipe that alters the LoRA adapter layer of these models and turns them into a mixture-of-experts which effectively is equivalent to learning a retriever in a learned index. Like putting that into the adapter layer as opposed to an external retriever for an index which is like RAG. So putting that into the weights of the model so that it could retrieve these facts very very accurately – like at extreme accuracy levels, to nines of accuracy.

It was not impossible. And that was when Open AI was releasing papers like "hallucinations are by design", as if these models are designed to hallucinate.

When people talk about hallucinations it sometimes feels like a vibe thing. But it's actually a technical problem! So when you found it – how do you measure the results? Do you have any metrics or evaluation system?

I think two types of benchmarks matter most. One is we have a comparison between a base model on a Wikipedia page. For example, on the Golden Gate Bridge, and it just hallucinates on the facts in it, versus our model doesn't hallucinate on. So it goes from something like 30% to 90% accuracy.

Then of course for our customers, often enterprises, like Colgate – they don't really care about general evals. They don't care about MMLU, or like how the models do on math, none of this matters to them. Many of them don't even care about the text-to-SQL benchmarks out there. For example Spider and Bird – these are very common text-to-SQL benchmarks in the ML community and have been around for a while. They don't care because if a model does well on that, it doesn't mean it does well on their pretty complex schema. If you go inspect it's actually an extremely different task. Not just that the schemas are much more complex but the task itself is ill-defined for what the enterprise actually wants the model to do.

So what we do is instead: okay well, let's inspect your own internal evals. Let's actually help you craft that in a way that is easy – and we have a framework for that. They can just follow that easy/medium/hard, let's keep it simple, breakdown. And let's get the model to actually tackle easy/medium/hard questions. Typically our threshold is usually around 90% – cause I always want to see a 9 in there. For Colgate it was from 30% accuracy, using Open AI's latest model, for another Fortune 500 they had even more complex stuff is 6 to 90%.

And the most magical thing for me is seeing them do it. I was not there, I did not write a line of code for them. They did it, their developers did it. They're able to steer these models to that extreme accuracy.

AI Misconceptions & Market Trends

You have such a deep understanding on so many levels, most people are not that knowledgeable. What is the most often misconception you hear from the clients. What do you need to constantly explain to them?

I think there are a few things for AI researchers that we find... funny is not the right word but maybe we're just surprised that the market is interested in a few things:

One is just the word agent.

I talked to my friends from Open AI, Anthropic – they're like "I don't know why the market cares about this but I guess we have to do this for marketing". It's interesting because I feel like my gut reaction was also an allergic reaction. But then I double clicked more and started to talk to more and more people about it including non-AI researchers. That's where it really clicked for me: this is just a different view into the same thing. I think in a very model-centric way because I'm very comfortable with these models and that's how I've been working for the past decade. But other people might not view it that way. They view it as "how does this AI interact with a person or mimic a person". I think the agent view of the world is centered around a human or an individual. The AI is an individual as opposed to AI as model. I think the equivalence in software engineering is OOP (object-oriented programming). That's like object centered as opposed to functional programming.

So it's just a different view into almost effectively the same thing that you can accomplish.

The other thing is around RAG.

I know, I know! I think people are mesmerized by RAG. But for those of us who've been in the space for a long time we're surprised they are, because the ultimate retrieval is Google. The ultimate retrieval has been built and it's not AI but people view RAG as AI. It's actually this information retrieval which has been around for much longer than AI but it plugs into AI so it connects to the AI brain. It's just fascinating to me that I think it's confused with AI. There's no back propagation. It's like what you effectively put into the prompt and of course it affects the model as input. But from a technology standpoint that doesn't really mean anything.

When I think about it along with agents, it actually kind of makes sense because then it's like the full agent does all these things and comes together and it's this entity that's an individual. And then it kind of makes sense that they view this whole system as an AI, as opposed to just the model piece.

You just crushed two hottest topics!

No, no, they're important to understanding things. Why are they so big in the market? Because they explain a lot to people. I think it actually is the almost user experience or interface that makes the most sense to other people. It's just funny that in some ways experts like myself we have expert blind spot. So we just are so used to viewing the world from a model-centric way that it's almost a little confusing to us to see it that way.

But I actually think it's a media person who made open my eyes the most. When I had a conversation with her I realized "oh wow having the agent view of things actually makes way more sense to her of what AI does and how AI actually impacts the world". Otherwise if it's model-centric she's like how does that really impact the world. But when it's agent-centric thing, it suddenly is very clear what it is. It also matches all the sci-fi movies better. It just clicks.

Enterprise Use Cases

A lot of people ask and like to learn about use cases. You mentioned Colgate. What are other industries use cases that you are proud of?

There's one that I'm very excited about. It's a global 2000 biotech company, one of our customers, and it's actually crazy what they're trying to do.

They're trying to bring down a three to five year timeline of regular like cancer research to just weeks. If you just step back and think about it – that transforms an industry completely. That fundamentally will change human health. So I'm very excited about that cause it not only helps them make money but it also does good for the world.

I think that's transformational in a way that I can't even imagine how that industry will operate moving forward. Now this one company is gonna leapfrog other companies who can't do this. So that's very exciting.

What they're doing specifically is they're using our platform on premise and they are combining both public patent data as well as private PII data together because that's how they get the best information. But they're doing it in private way, it's safe and the data secure. They're not comfortable sending that away. They needed to be highly accurate and specialized and the goal was to get "better than a scientist".

Is it important for you for that the companies have super structured data?

No, actually the most important thing for us is that the use case that these companies have objective outputs. Meaning you can tell me what's good/better/best and your experts will agree with each other on it.

The more subjective it is – you're basically giving the model multiple north stars. It doesn't know where to go. It can optimize for one but that all just be worse at the other. These models are really good at optimization and you wanna be very clear where your north star is. You don't wanna have like giant fuzzy blob and then everyone's upset that it didn't go to the right way.

The biggest challenge we often face with our customers initially is just scoping that use case in a way where it is an objective use case.

So that's why we actually like text-to-SQL. Not only does it permeate every enterprise because every enterprise has been putting their data into structured formats for decades, and their most valuable data is there, cause that's how we've done analytics historically.

Not only that but they also like when the SQL query fails they we can all agree it failed.

You have to be very clear what you want. And in the text-to-SQL case it's just so clear, it's already laid out.

Do you think you will go from hallucination to cover some other areas? What are they?

Okay, I can't say exactly what's next because I do think hallucinations will take up a lot of time for us, and I think our customers will help show what would be prioritized next. But in terms of ideas what's next: one is creativity. What if with these models could be even more creative than they are now and optimizing for that could be one thing. But I don't know how valuable that is. For us to create a business that does work and that does bring in revenue we do have to kind of follow where the value is right now. Maybe the same amount of work as going after creativity but our customers value this hallucination piece a lot more today.

Research & Open Source

For my news digest, I go through hundreds of papers every week. And I came across this paper, it's called "How to steer LLM latents for hallucination detection?" I thought that was very close to what we're discussing. I don't know if you've heard about it.

Tell me the premise!

They suggest truthfulness separator vector that basically will nudge the positions of truthful statements away from false ones, to empty the latent space more. So my question was, first of all, how do you follow research in this area? And the second one: is it more of research with your team or you look outside and implement what other people do in research. What is your process with research?

Yeah that's a really good question. A few things around research: it's probably a combination of making sure we do read the highlights, learning the general trends. Research papers are interesting but just from my experience it doesn't often translate to commercial value or to actual commercial results. So there are cases when we will gather information and have an understanding if what researchers are proposing is true and then develop a hypothesis of whether we should go inspect that for our particular case.

That makes sense. Do you open source?

We have some stuff open sourced, we've published papers before, we do have some that's open but we don't have everything open. Our core IP – how we do that post training on the model to make it keep it factual - it's not open sourced as of today.

It's funny that you bring up this paper – it matches the thesis of like you can actually steer the weights towards more to the higher factuality. I fully believe that you can do that. And you can do that many different ways. One way is we were able to do fine tuning without supervision. The reason why we're able to do that is cause we have automated data pipelines for these models, we have agents that edit its own training data. So that's what we do. But if you read the DeepSeek paper – they do a similar thing. They have a factuality validator to train their reward model. So there's that ability to incorporate factuality in multiple different ways through back propagation. That's what that paper shows and I fully believe that thesis right that you can make the models more factual.

What's your general take on open source?

Oh I love the way the open source is going. We leverage a lot of open source. Usually our customers are actually comparing one of open source models using our system to post-train it, edit it on their own model, like using Llama or DeepSeek first, and then using our system to modify it towards their dataset so it doesn't hallucinate on their data. And then comparing that against either Open AI or Claude based models. That's typically what's going on. We depend so much on open source and we're partnered very closely with Meta in particular and a bit with Mistral. The DeepSeek stuff has been very helpful for us – to pass it on to our customers, of course. Some of our pipelines does benefit from reasoning. Our thesis is for us as infrastructure to not own the models. It's for our customers to own them. Because there are deep derivatives of their data at the end of the day and so we want them to be able to have that effectively applied their same data governance to their models in the weights. We don't need to see them but we're just the infrastructure that helps them modify it.

Education, Accessibility & Philosophy

In your bio on LinkedIn you wrote "there are only a few hundred of us AI experts, who can successfully teach and control AI. The future would suck, if we were the only ones who could define what intelligence could be". I also find the lack of knowledge about AI, combined with the pace of its development, jarring and concerning. Tell me what you're thinking. What should we do about this lack of knowledge?

I think there are a few things we can do.

One is lowering the barrier to entry — making these tools much easier to use. The fine-tuning and post-training recipe we’ve developed is already easier than, say, OpenAI’s API. Colgate, for example, can use it successfully. Over time, this ease of use could become the norm, even for experts — the way smartphones became universal, even though that wasn’t their original intention. I really believe post-training methods, including fine-tuning, can become as simple as prompting. And if you can Google search, you're already prompting.

We also need more automation and smarter system design — tools that understand what users want to influence without overwhelming them. You don’t need to see every hyperparameter or understand the math. Most of it doesn’t matter for your use case anyway.

The other part is leveling people up — and that’s where teaching comes in. My background is in classics, not computer science. I only got into CS because a professor once said, “It’s never the user’s fault.” That really stuck with me. I was the one struggling with tech growing up — and I realized: if I design for that version of myself, it’ll be easier for everyone.

That’s what led me to product management — thinking deeply about the end user, with empathy and compassion. And then I fell in love with the magic of generative AI.

So that’s what I try to combine in both my company and courses: make it more accessible, lower the barriers, and help people level up. I don’t think it has to be intimidating. Sometimes the math is actually simple. I once gave a master class to policymakers in DC on neural networks using just multiplication and addition — and they got it.

It really is possible. When it all comes together, it feels magical.

I thought about starting something like TikTok for education, well essentially – edutainment.

I'm thinking about the kids – they're AI natives. And I believe they do need to know what this technology is, and what machine learning is, and what are all those, say, prompting – it is not that easy because you need to know how to communicate with computer. And it's different with fine-tuning, because you need more technical knowledge. So yes, please do TikTok :)

If one searches hard enough, one may be able to find some stuff there! Learning about this stuff can be very engaging and fun. Class I taught at Stanford previously – I actually created 100 memes for it to help people study. Cause why not laugh when you're studying. And one of the extra credits I gave people was for: if you can create a meme and I laugh at it, okay, fine. If it was funny I don't have to laugh that hard, but if you created a meme, it was funny – you demonstrated your knowledge of the concept. You understand something to the point you can make a joke about it – then you get an extra credit point and so why not make it fun!

Thank you so much for this conversation!

Do leave a comment

Reach out to learn about Sponsorship | Follow us on X (Twitter) and LinkedIn | Reply to this email if you have any questions or suggestions

Update your email preferences or unsubscribe here

1434 Western Ave, Suite 1 #4796
Albany, New York 12203, United States

Powered by beehiiv

Terms of Service

🎙️🧩 TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI

🎙️🧩 TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI

📝 Executive Summary

GenAI then and now

Empowering Developers & Democratization

Technical Approach to Hallucination

AI Misconceptions & Market Trends

Enterprise Use Cases

Research & Open Source

Education, Accessibility & Philosophy

Do leave a comment

Older messages

🔍 Ready To See Your Clicks Go Up?

Sunday Thinking ― 3.23.25

🦄 Why AI agents matter now

LUC #78: CAP Theorem Explained In Simple Terms

'Every Day Is a Gift'

You Might Also Like

💓 Build a business with a purpose

Read this if you're making LESS than $1000/month online...

♦️ Why your brand needs an ideology to stand out (and how to develop one)

Boost your brand with influencers beyond social

ET: March 25th 2025

The income needed to purchase a typical U.S. home has increased by 79% in just 5 years

We’re Back: Q1 Was a Record for Tech Acquisitions in Dollars

ChatGPT Gets a Personality Upgrade 🎤

The Farmer's Dog has its day

366 Days. 366 Emails.