How to Improve Your LLM : Combine Evaluations with Analytics
Tomasz TunguzVenture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. How to Improve Your LLM : Combine Evaluations with Analytics
The future of LLM evaluations resembles software testing more than benchmarks. Real-world testing looks like this, asking LLMs to produce Dad jokes like this zinger : I’m reading a book about gravity & it’s impossible to put down. Machine learning benchmarks like those published by Google for Gemini2 last week, or precision and recall for classifying dog & cat photos, or the BLEU score for measuring machine translation provide a high-level comparison of relative model performance. But this isn’t enough for a product team to be satisfied that their LLM-enabled product will perform well in the wild. LLMs are tricky. They don’t always provide the identical answer to the same or similar input. 1 can be greater than 4.. This is called non-determinism. How to solve this problem? To produce high quality LLM-products, teams will need to combine analytics with evaluation. Combining analytics with evaluation is the key to improving performance. Analytics surface the questions users ask when using the model. Those questions create the evaluations product teams use to determine performance. They gather additional data, retrain/fine-tune the model, & release it again. Today, evaluations are rule based or human-in-the-loop evaluations. But in the future, other models will judge the output to ensure consistency over time. And the iteration wheel improves ensuring that the Dad jokes from a model really are the best. |
Older messages
In Like a Lamb, Out Like a Lion
Wednesday, December 6, 2023
Tomasz Tunguz Venture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. In Like a Lamb, Out Like a Lion Mongo announcing earnings
Top 10 Posts of 2023
Tuesday, December 5, 2023
Tomasz Tunguz Venture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. Top 10 Posts of 2023 These are the top 10 posts by engagement
Managing Data as Product : Office Hours with Philip Zelitchenko
Wednesday, November 29, 2023
Tomasz Tunguz Venture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. Managing Data as Product : Office Hours with Philip Zelitchenko
New Broadcast
Wednesday, November 22, 2023
Tomasz Tunguz Venture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. Title: The Evolution of Data Teams and Data Security Over the
Who Took a Bet on You?
Wednesday, November 22, 2023
Tomasz Tunguz Venture Capitalist If you were forwarded this newsletter, and you'd like to receive it in the future, subscribe here. Who Took a Bet on You? I can remember at each career
You Might Also Like
Peppered Kitty and The Penal Guard 💂♂️
Tuesday, November 12, 2024
The breed of the non-human͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🗞 What's New: HARO/Connectively is shutting down
Tuesday, November 12, 2024
Also: Use AI to beef up your security ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
the wizard of oz.
Tuesday, November 12, 2024
Read time: 53 sec. Today I want to tell you about Cristiano. He was part of our last Starter Story Academy sprint. And during his first two weeks, he was busy designing and tweaking his landing page.
💃 Beyoncé loves her products...here’s how she did it
Tuesday, November 12, 2024
The exact steps to build your beauty brand empire Hey Friend , We just launched our newest course, How to Build a Million Dollar Beauty Brand. In it, for the first time, Alicia Scott—founder of Range
[CEI] Chrome Extension Ideas #166
Tuesday, November 12, 2024
ideas for Amazon, Twitter, Developers, and Students ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Navattic's PLG funnel with Natalie Marcotullio
Tuesday, November 12, 2024
In conversation with Navattic's Head of Growth about their product-led growth (PLG) funnel. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You have one shot to sell your business 🤞
Tuesday, November 12, 2024
Just One Week to Go Until Exit Strategy Launches!
Product manager is an unfair role. So work unfairly.
Tuesday, November 12, 2024
How to thrive in “the great flattening” by redefining work norms ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Growth Newsletter #223
Tuesday, November 12, 2024
It's not "what" but "where" ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
All stock, 6-figure deal
Tuesday, November 12, 2024
Plus, overcome a big barrier to exit planning: owner dependency ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏