The Sequence Chat: Raza Habib, Humanloop on Building LLM-Driven Applications
Was this email forwarded to you? Sign up here The Sequence Chat: Raza Habib, Humanloop on Building LLM-Driven ApplicationsHumanloop is one of the emerging platforms that allow developers to build large scale applications on top of LLMs.👤 Quick bio
I’m the co-founder and CEO of Humanloop. We help developers build reliable applications on top of LLMs like GPT-4. I first got interested in machine learning when I was a physics undergrad at Cambridge and saw Professor Sir David Mackay's lectures on information theory and learning algorithms. The idea of building intelligent learning systems fascinated me and I was immediately hooked. I was excited both by the potential applications of the technology and also by the dream that we might understand how brains work. Later, during my PhD, the rate of progress in AI and NLP totally staggered me. Things that I didn't expect to happen for decades kept happening every year and it feels like it's only been accelerating since then! I initially studied physics and at the start of the 20th century, all the smartest people were drawn to the problems of quantum mechanics. Today, it seems to me, that the most exciting and challenging problems are in AI. 🛠 ML Work
I’ve believed for a long time now that foundational AI models, like GPT-3/4, are the start of the next big computing platform. Developers building on top of these models will be able to build a new generation of applications that until recently would have felt like science fiction. We’ve already seen examples of these in the form of intelligent assistants like chatGPT or Github copilot for software but these are just the beginning. We've worked closely with some of the earliest adopters of GPT-3 to understand the challenges faced when working with this powerful new technology. Repeatedly we heard that prototyping was easy but getting to production was hard. Evaluation is subjective and difficult. Prompt engineering was more art than science. Models hallucinate and are hard to customise. To unlock the potential of LLM applications we need a new set of tools built from first principles. At Humanloop, we’ve been building the tools needed to take the raw intelligence of a Large Language Model and wrangle that into a differentiated and reliable product. Our vision is to empower millions of developers to build novel and useful apps and products with LLMs.
OpenAI pioneered the techniques needed to train instruction following models and the main steps and workflow are largely unchanged. There are three steps:
After supervised finetuning (step 2) the models are quite good at instruction following but RLHF provides much more feedback data and allows the models to learn more abstract human preferences, like a preference for honest answers.
One of the hardest parts of building with LLMs is that evaluation is much more subjective than in traditional software or machine learning. When you’re building a coding assistant, sales coach or personal tutor, it’s not straightforward to say what the “correct” answer actually is. You can get moderately far using traditional machine learning metrics like ROUGE but we’ve found that by far the best signal of performance is human feedback. This feedback can be generated during development from an internal team but it’s particularly important to capture feedback data in production based on how users actually respond to the model’s behavior. We’ve seen three types of feedback be particularly useful:
The feedback data you collect in production allows you to monitor performance and also to take actions to improve models over time (e.g through finetuning) Another common best practice for evaluation and monitoring models is to use a second LLM to score the generations from your application. In practice evaluation is a much easier task than generation and LLMs provide surprisingly accurate scoring information.
The trends that excite me most are parameter-efficient finetuning, larger context windows and multi-modality. The context window is the amount of “tokens” (similar to words) a model can “read” before generating a response. Today’s models can’t learn new things after training and so any new information needs to be included in the context window. Many applications today are limited by the size of this context window but I think we can reasonably expect much longer contexts in the future. Parameter-efficient finetuning methods like LoRa make it cost-effective to finetune LLMs yourself. This will enable a lot of developers to train private models and enable products that are privacy-sensitive or need a lot of personalization Language models do surprisingly well in questions that require world knowledge despite only having seen text but this is a severe limitation in actual understanding. Model’s trained on images, text, audio, video etc are a natural next step and will allow a much richer understanding of the world. 💥 Miscellaneous – a set of rapid-fire questions
I find this question hard to answer because I think ultimately most of AI is actually generative AI. Taken in its broadest sense, generative AI is trying to learn the full probability distribution of a dataset from unlabelled data. Once this distribution is learned it can be used for discriminative tasks like classification, for sampling (generation) and even for reasoning and compression. So I actually think generative AI is not really distinct from AI writ large.
I think both strands are important and both will win in different ways. Open source enables permissionless innovation and will drive a lot of creativity. For many use cases, existing models are smart enough and the real challenges are product challenges or privacy, latency and cost. Open-source models will help a lot here. This may even be the majority of use-cases by number. However, there are valuable use cases that are well beyond the capabilities of existing models e.g. scientific research. To get to these capabilities we’ll have to build much more powerful models that will require investment beyond what OSS can support. The model capabilities also become increasingly dangerous in the hands of bad actors and will likely not be safe to Open source.
I think it almost certainly requires a new stack. It’s fundamentally a new paradigm of software and is just getting going!
Multimodality, larger context lengths, better reasoning are big milestones. GPU compute and talent are the main bottlenecks. On a 5-year time horizon I think its conceivable to see capabilities quite close to AGI. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Key phrases
Older messages
Meet MiniGPT-4: The Open Source Vision-Language Model that Matches the Performance of GPT-4
Monday, June 12, 2023
The model expands Vicuna with vision capabilities similar to BLIP-2 in one of the most interesting open source releases in the multi-modality space.
Meet the LLM Garden 🪴🌱
Monday, June 12, 2023
With new LLMs being introduced daily, it's hard to stay on top of what's new and easily compare LLMs. So Superwise, Blattner Tech, and TensorOps pooled forces to put together a resource for the
The AlphaDev Milestone: A New Model that is Able to Discover and Improve Algorithms
Monday, June 12, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
📺 See how programmatic labeling is the key to using LLMs [Live Demo]
Monday, June 5, 2023
Even with the rapid advancements to AI made possible by LLMs and Foundation Models, data remains the key to unlocking real value for enterprise AI. Join us at this live demo, where Snorkel AI co-
The Next RLHF Effect: Three Breakhroughts that can Unlock the Next Wave of Innovation in Foundation Models
Sunday, June 4, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
You Might Also Like
Flattening Lists of Lists, Python 3.13, Sets, and More
Tuesday, May 14, 2024
Flattening a List of Lists in Python #629 – MAY 14, 2024 VIEW IN BROWSER The PyCoder's Weekly Logo Flattening a List of Lists in Python In this video course, you'll learn how to flatten a list
Daily Coding Problem: Problem #1441 [Easy]
Tuesday, May 14, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. UTF-8 is a character encoding that maps each symbol to one, two, three, or four
Noonification: 3 Quick Ways to Optimize RecyclerView
Tuesday, May 14, 2024
Top Tech Content sent at Noon! Get Algolia: AI Search that understands How are you, @newsletterest1? 🪐 What's happening in tech today, May 14, 2024? The HackerNoon Newsletter brings the HackerNoon
Using 97 fewer cores thanks to PGO
Tuesday, May 14, 2024
Plus an HNSW indexed vector store library, a new Go game hits the Steam store, and is 'ok' ok?. | #507 — May 14, 2024 Unsub | Web Version Together with Stytch logo Go Weekly Reclaiming CPU for
Ranked | The Top 6 Economies by Share of Global GDP (1980-2024) 📈
Tuesday, May 14, 2024
Gain a unique perspective on the world's economic order from this graphic showing percentage share of global GDP over time. View Online | Subscribe Presented by: Data that drives the
Free online event this Thursday: Getting ahead with time series data
Tuesday, May 14, 2024
Free Online Event Do you know how your competitors use time series data to get ahead? Join us on Thursday, May 16 at 10am PT/1pm ET for a free, hour-long online fireside chat called “Unleash the Full
Here's the deal
Tuesday, May 14, 2024
We wanted you to be among the first to know about our plans to relaunch the Gigantic training courses that Product Collective now powers! Here's the deal: From May 20th - May 31st, anybody that
Anthropic’s Claude goes to Europe
Tuesday, May 14, 2024
Anthropic is launching the AI assistant in a few countries on the continent View this email online in your browser By Rebecca Bellan Tuesday, May 14, 2024 Welcome to TechCrunch AM! There's rarely a
LW 133 - Using The Checkout Branding API To Customize a Shopify Checkout
Tuesday, May 14, 2024
Using The Checkout Branding API To Customize a Shopify Checkout Shopify Development news and articles Issue 133 - 05/14/2024 Read Online Liquid Weekly All Things Shopify Development Using The Checkout
⚙️ Apple partners with OpenAI for IOS 18
Tuesday, May 14, 2024
Plus: Your Instagram/Facebook posts are being used to train Meta's AI