📝 Guest Post: The Coming Wave of Specialized AI*
Was this email forwarded to you? Sign up here In this blogpost, Piero Molino, author of Ludwig.ai, CSO & Co-Founder at Predibase, will make the case for why smaller, faster, fine-tuned LLMs are poised to take over large, general AI models. Training your own model used to require a huge upfront cost, but now open-source models can be fine-tuned on a few thousand examples, a small investment that is quickly amortized compared to using general models that, because of their generality, are very large and expensive. Consequently, companies are adopting a new AI development process that makes it possible to realize business value from AI much faster, and you can adopt it too. If you’d like to learn more about specialized AI and fine-tuning, join Predibase’s November 7th webinar "Fine-Tune and Serve 100s of LLMs for the Cost of One with LoRAX”. The rise of general AIAI has been in the spotlight since late 2022, thanks to groundbreaking generative services such as Midjourney and ChatGPT, which have introduced remarkable capabilities in image generation and text-based interactions. Their success is mostly due to how easy they have made using AI for anyone, not just developers or machine learning experts, through user-friendly interfaces and APIs which eliminated entry barriers. The simplicity of integrating commercial AI APIs into software applications ignited a wave of excitement and innovation among hackers, entrepreneurs, and businesses, leading to rapid prototyping and widespread adoption. General AI shortcomingsToday, after a year of experimentation, companies that have integrated AI into their applications are recognizing the challenges posed by the high costs, limited throughput, increased latency, and concerns about privacy and ownership associated with commercial AI APIs. The generality of these APIs, which initially benefited consumers, is becoming a drawback. Why would an enterprise, with specific business tasks, pay the cost of a large model capable of generating French poetry when a specialized, smaller, quicker, task-specific model can often outperform it and cost less? General models are similar to CPUs, capable of handling a variety of tasks reasonably well, while specialized models resemble accelerators, such as GPUs or ASICs: they may lack the same versatility, but excel in specific tasks, use resources better and are more cost-effective. The reason why general large models have been preferred is that they don't need any data collection and can be used right away, while building your own model has historically required significant amounts of training data, resulting in much higher upfront costs. This is not true anymore: fine-tuning is emerging as the mechanism for obtaining specialized models for specific tasks without requiring huge amounts of data (a few thousand examples are sufficient). Fine-tuning enables any organization to have their own specialized GPT without breaking the bank. Advantages of commercial LLMs
Concerns related to commercial LLMs
Fine-tuned models outperform general modelsWhile the shortcomings of large general models may be tolerable for prototypes and experimentation, they quickly become unsustainable as applications reach production volumes. This dilemma leaves practitioners with limited choices: go back to creating models from scratch–which requires an often prohibitive amount of data and compute resources–or leverage techniques to fine-tune models for task-specific applications. While headlines have often focused on the latest large language models (LLMs) with ever more parameters, the open-source community has demonstrated the potential of collaboration in closing the performance gap. Models like Llama-v2, Stable Diffusion and Mistral, along with other LLMs, have made powerful capabilities widely accessible. Moreover, new fine-tuning recipes made it possible to train smaller models with performance competitive with commercial LLMs using orders of magnitude more parameters, like in the case of Alpaca 7B: a highly performant instruction-following language model created by Stanford researchers by fine-tuning LLaMA 7B-13B on just 52K task-specific instructions automatically generated using larger LLMs. A similar recipe was used to produce Vicuna, which got even closer results to larger models. More recent research (LIMA) has shown that fine-tuning is a viable solution with just 1000 examples of high quality data. It is now well understood that commercial LLMs are not as performant as fine-tuned models on the majority of the tasks: as shown by this recent survey, ChatGPT underperforms fine-tuned baselines in 77.5% of tasks from 19 academic papers detailing 151 tasks and the author concluded that “vanilla ChatGPT will be used as a quick prototype for some applications, but it will be replaced by a fine-tuned model (often smaller, for economical reasons) for most production-ready solutions.” Moreover, the best practices for fine-tuning open-source LLMs are now made accessible to developers by tools like Ludwig.ai, our open source declarative ML framework. Ludwig makes fine-tuning easy through a straightforward configuration file and optimizes fine-tuning on readily available hardware like T4 GPUS (those provided for free in Google Colab). Fine-tuning has become a strong option among the different approaches to building AI, which vary depending on the task details and data available:
Over time, organizations without available data that start using general models will start collecting data. The tasks they perform will shift toward the left side of the distribution in figure 1, making fine-tuning the most attractive option quickly. Simultaneously, new general models will become able to solve new tasks, extending the right end of the distribution in figure 1. The enterprises in the best position will be those that adopt tools capable of supporting techniques required across the entire task distribution, but general models, even when used with in-context learning, despite reducing training costs to virtually zero at the beginning, come with higher inference expenses. In contrast, fine-tuned models are not only cost-effective to train but also economical to deploy, making them the most financially efficient choice with the highest return on investment and lowest total cost of ownership. Left chart: While a smaller open-source model may not meet performance requirements out of the box, fine-tuning it with even small amounts of high-quality data (10^4 data points) improves performance dramatically and makes it competitive even with models trained from scratch on orders of magnitude more data (10^7 data points). Right chart: While training from scratch has a large upfront cost, a general model like GPT-4 is ready to use out of the box but is expensive to use at scale. Fine-tuned open-source models are incredibly economical with very low upfront costs. If you deploy using LoRAX rather than giving every fine-tuned model its own set of dedicated GPU resources, you can substantially reduce total cost of ownership compared to a model trained from scratch or a commercial LLM. A new model development lifecycleIn the past 15 years, a significant initial effort was required for data collection and labeling before even considering training a machine learning model. Now we are observing our customers adopting a new process. Teams start by using a general model – either commercial APIs or open source – to build a first prototype version of an AI product. This shift significantly reduces the barriers to entry and the cold start problem and enables quick validation. The use of the prototype generates data that is collected and can be used to fine-tune specialized models. The process then repeats by using the specialized model, improving performance with each iteration. General models serve as the starting point for a v0 product, essentially igniting the engine that drives the iterative refinement of specialized AI models. By adopting this process, engineers enjoy the best of both worlds: they can quickly realize value through general AI and then optimize for cost, accuracy, and speed through specialized models. This optimization occurs only when the AI product has already demonstrated value. From the ongoing adoption of this process, we derive a clear vision of a near future where specialized, fine-tuned LLMs that are tailored to specific tasks will be ubiquitous, unlocking new levels of efficiency, speed, and cost-effectiveness for all companies developing AI products. How Predibase makes this AI development lifecycle easyPredibase provides open source AI infrastructure for developers built on top of the popular open source framework Ludwig. The Predibase platform allows developers to efficiently fine-tune and serve open source LLMs on scalable managed infrastructure with just a few lines of code. Models are served through a dynamic serving infrastructure that adjusts automatically to match the requirements of your production environment. It allows to serve numerous fine-tuned LLMs simultaneously, leading to more than a 100x reduction in costs compared to dedicated deployments and you can load and query these models in a matter of seconds. Try Predibase for free to start fine-tuning and serving open source LLMs for your tasks. *This post was written by Piero Molino, author of Ludwig.ai, CSO & Co-Founder at Predibase, specially for TheSequence. We thank Predibase for their insights and ongoing support of TheSequence.You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 338: Inside WebAgent: Google DeepMind's Instruction-Tuned LLM that can Complete Tasks on Websites
Thursday, October 26, 2023
The model combines language understanding and web navigation.
Edge 337: Understanding QLoRA
Tuesday, October 24, 2023
How a simple and effective optimization on LoRA resulted in an incredibly efficient fine-tuning method.
📝 Guest Post: LLMs & humans: The perfect duo for data labeling
Monday, October 23, 2023
How to build a pipeline to achieve superhuman quality
Fuyu-8B Makes the Case for Simple, Fast, and Powerful Generative AI Models
Sunday, October 22, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
❗️🔎 Your expertise needed: weigh in on the ML Insider 2023 Survey
Friday, October 20, 2023
Take the ML Insider Survey Share your experience developing ML and compare it with other ML experts. We're happy to support cnvrg.io in creating the ML Insider Report. They reach out to thousands
You Might Also Like
Weekend Reading — More time to write
Sunday, November 24, 2024
More Time to Write A fully functional clock that ticks backwards, giving you more time to write. Tech Stuff Martijn Faassen (FWIW I don't know how to use any debugger other than console.log) People
🕹️ Retro Consoles Worth Collecting While You Still Can — Is Last Year's Flagship Phone Worth Your Money?
Saturday, November 23, 2024
Also: Best Outdoor Smart Plugs, and More! How-To Geek Logo November 23, 2024 Did You Know After the "flair" that servers wore—buttons and other adornments—was made the butt of a joke in the
JSK Daily for Nov 23, 2024
Saturday, November 23, 2024
JSK Daily for Nov 23, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Not Ready For The Camera 📸
Saturday, November 23, 2024
What (and who) video-based social media leaves out. Here's a version for your browser. Hunting for the end of the long tail • November 23, 2024 Not Ready For The Camera Why hasn't video
Daily Coding Problem: Problem #1617 [Easy]
Saturday, November 23, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. You are given an string representing the initial conditions of some dominoes.
Ranked | The Tallest and Shortest Countries, by Average Height 📏
Saturday, November 23, 2024
These two maps compare the world's tallest countries, and the world's shortest countries, by average height. View Online | Subscribe | Download Our App TIME IS RUNNING OUT There's just 3
⚙️ Your own Personal AI Agent, for Everything
Saturday, November 23, 2024
November 23, 2024 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Convergence. Imagine if you had a digital
Educational Byte: Are Privacy Coins Like Monero and Zcash Legal?
Saturday, November 23, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 23, 2024? The HackerNoon
🐍 New Python tutorials on Real Python
Saturday, November 23, 2024
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Black Friday Giveaway @ Real Python This Black
Re: Hackers may have stolen everyone's SSN!
Saturday, November 23, 2024
I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for