The Sequence Chat: Raza Habib, Humanloop on Building LLM-Driven Applications
Was this email forwarded to you? Sign up here The Sequence Chat: Raza Habib, Humanloop on Building LLM-Driven ApplicationsHumanloop is one of the emerging platforms that allow developers to build large scale applications on top of LLMs.👤 Quick bio
I’m the co-founder and CEO of Humanloop. We help developers build reliable applications on top of LLMs like GPT-4. I first got interested in machine learning when I was a physics undergrad at Cambridge and saw Professor Sir David Mackay's lectures on information theory and learning algorithms. The idea of building intelligent learning systems fascinated me and I was immediately hooked. I was excited both by the potential applications of the technology and also by the dream that we might understand how brains work. Later, during my PhD, the rate of progress in AI and NLP totally staggered me. Things that I didn't expect to happen for decades kept happening every year and it feels like it's only been accelerating since then! I initially studied physics and at the start of the 20th century, all the smartest people were drawn to the problems of quantum mechanics. Today, it seems to me, that the most exciting and challenging problems are in AI. 🛠 ML Work
I’ve believed for a long time now that foundational AI models, like GPT-3/4, are the start of the next big computing platform. Developers building on top of these models will be able to build a new generation of applications that until recently would have felt like science fiction. We’ve already seen examples of these in the form of intelligent assistants like chatGPT or Github copilot for software but these are just the beginning. We've worked closely with some of the earliest adopters of GPT-3 to understand the challenges faced when working with this powerful new technology. Repeatedly we heard that prototyping was easy but getting to production was hard. Evaluation is subjective and difficult. Prompt engineering was more art than science. Models hallucinate and are hard to customise. To unlock the potential of LLM applications we need a new set of tools built from first principles. At Humanloop, we’ve been building the tools needed to take the raw intelligence of a Large Language Model and wrangle that into a differentiated and reliable product. Our vision is to empower millions of developers to build novel and useful apps and products with LLMs.
OpenAI pioneered the techniques needed to train instruction following models and the main steps and workflow are largely unchanged. There are three steps:
After supervised finetuning (step 2) the models are quite good at instruction following but RLHF provides much more feedback data and allows the models to learn more abstract human preferences, like a preference for honest answers.
One of the hardest parts of building with LLMs is that evaluation is much more subjective than in traditional software or machine learning. When you’re building a coding assistant, sales coach or personal tutor, it’s not straightforward to say what the “correct” answer actually is. You can get moderately far using traditional machine learning metrics like ROUGE but we’ve found that by far the best signal of performance is human feedback. This feedback can be generated during development from an internal team but it’s particularly important to capture feedback data in production based on how users actually respond to the model’s behavior. We’ve seen three types of feedback be particularly useful:
The feedback data you collect in production allows you to monitor performance and also to take actions to improve models over time (e.g through finetuning) Another common best practice for evaluation and monitoring models is to use a second LLM to score the generations from your application. In practice evaluation is a much easier task than generation and LLMs provide surprisingly accurate scoring information.
The trends that excite me most are parameter-efficient finetuning, larger context windows and multi-modality. The context window is the amount of “tokens” (similar to words) a model can “read” before generating a response. Today’s models can’t learn new things after training and so any new information needs to be included in the context window. Many applications today are limited by the size of this context window but I think we can reasonably expect much longer contexts in the future. Parameter-efficient finetuning methods like LoRa make it cost-effective to finetune LLMs yourself. This will enable a lot of developers to train private models and enable products that are privacy-sensitive or need a lot of personalization Language models do surprisingly well in questions that require world knowledge despite only having seen text but this is a severe limitation in actual understanding. Model’s trained on images, text, audio, video etc are a natural next step and will allow a much richer understanding of the world. 💥 Miscellaneous – a set of rapid-fire questions
I find this question hard to answer because I think ultimately most of AI is actually generative AI. Taken in its broadest sense, generative AI is trying to learn the full probability distribution of a dataset from unlabelled data. Once this distribution is learned it can be used for discriminative tasks like classification, for sampling (generation) and even for reasoning and compression. So I actually think generative AI is not really distinct from AI writ large.
I think both strands are important and both will win in different ways. Open source enables permissionless innovation and will drive a lot of creativity. For many use cases, existing models are smart enough and the real challenges are product challenges or privacy, latency and cost. Open-source models will help a lot here. This may even be the majority of use-cases by number. However, there are valuable use cases that are well beyond the capabilities of existing models e.g. scientific research. To get to these capabilities we’ll have to build much more powerful models that will require investment beyond what OSS can support. The model capabilities also become increasingly dangerous in the hands of bad actors and will likely not be safe to Open source.
I think it almost certainly requires a new stack. It’s fundamentally a new paradigm of software and is just getting going!
Multimodality, larger context lengths, better reasoning are big milestones. GPU compute and talent are the main bottlenecks. On a 5-year time horizon I think its conceivable to see capabilities quite close to AGI. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Meet MiniGPT-4: The Open Source Vision-Language Model that Matches the Performance of GPT-4
Monday, June 12, 2023
The model expands Vicuna with vision capabilities similar to BLIP-2 in one of the most interesting open source releases in the multi-modality space.
Meet the LLM Garden 🪴🌱
Monday, June 12, 2023
With new LLMs being introduced daily, it's hard to stay on top of what's new and easily compare LLMs. So Superwise, Blattner Tech, and TensorOps pooled forces to put together a resource for the
The AlphaDev Milestone: A New Model that is Able to Discover and Improve Algorithms
Monday, June 12, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
📺 See how programmatic labeling is the key to using LLMs [Live Demo]
Monday, June 5, 2023
Even with the rapid advancements to AI made possible by LLMs and Foundation Models, data remains the key to unlocking real value for enterprise AI. Join us at this live demo, where Snorkel AI co-
The Next RLHF Effect: Three Breakhroughts that can Unlock the Next Wave of Innovation in Foundation Models
Sunday, June 4, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
You Might Also Like
Transformers are Eating Quantum
Sunday, November 24, 2024
DeepMind's AlphaQubit addresses one of the main challenges in quantum computing. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Retro Recomendo: Gift Ideas
Sunday, November 24, 2024
Recomendo - issue #438 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Kotlin Weekly #434
Sunday, November 24, 2024
ISSUE #434 24th of November 2024 Hi Kotliners! Next week is the last one to send a paper proposal for the KotlinConf. We hope to see you there next year. Announcements State of Kotlin Scripting 2024
Weekend Reading — More time to write
Sunday, November 24, 2024
More Time to Write A fully functional clock that ticks backwards, giving you more time to write. Tech Stuff Martijn Faassen (FWIW I don't know how to use any debugger other than console.log) People
🕹️ Retro Consoles Worth Collecting While You Still Can — Is Last Year's Flagship Phone Worth Your Money?
Saturday, November 23, 2024
Also: Best Outdoor Smart Plugs, and More! How-To Geek Logo November 23, 2024 Did You Know After the "flair" that servers wore—buttons and other adornments—was made the butt of a joke in the
JSK Daily for Nov 23, 2024
Saturday, November 23, 2024
JSK Daily for Nov 23, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Not Ready For The Camera 📸
Saturday, November 23, 2024
What (and who) video-based social media leaves out. Here's a version for your browser. Hunting for the end of the long tail • November 23, 2024 Not Ready For The Camera Why hasn't video
Daily Coding Problem: Problem #1617 [Easy]
Saturday, November 23, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. You are given an string representing the initial conditions of some dominoes.
Ranked | The Tallest and Shortest Countries, by Average Height 📏
Saturday, November 23, 2024
These two maps compare the world's tallest countries, and the world's shortest countries, by average height. View Online | Subscribe | Download Our App TIME IS RUNNING OUT There's just 3
⚙️ Your own Personal AI Agent, for Everything
Saturday, November 23, 2024
November 23, 2024 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Convergence. Imagine if you had a digital