Forwarded this email? Subscribe here for more

Was this email forwarded to you? Sign up here

Welcome to the World of Small(er) Language Models

Smaller, highly specialized and cost-effective LLMs are a trend to track in generative AI.

Nov 24

READ IN APP

A creative illustration showing programmers opting to use smaller open-source artificial intelligence language models instead of larger, closed-source AI models like OpenAI. The scene is set in a modern, collaborative workspace filled with diverse programmers, each engrossed in their work on computers. Some programmers are young, some older, with a mix of genders and ethnicities, reflecting inclusivity and diversity. Each workstation prominently displays visuals or symbols of open-source AI models, glowing on their screens, contrasting with a large, shadowy figure of a closed-source AI model like OpenAI looming in the background. The environment is lively and positive, highlighting the enthusiasm and commitment of programmers to open-source technology. — Created Using DALL-#

Next Week in The Sequence:

Edge 347: Our series about fine-tuning dives into Anthropic’s Constitutional AI, reviews the original paper about this idea and explores the HumanLoop platform for fine-tuning.
Edge 348: We deep dive into Fuyu-8B, the multimodal model open sourced created by Adept.ai.

You can subscribe below:

📝 Editorial: Welcome to the World of Small(er) Language Models

Large language models (LLMs) have led the generative AI revolution in recent years. Questions related to the scaling limits of LLMs and whether scaling is the only path forward are sources of constant debate in the generative AI community. Recently, we have seen the emergence of another term that attempts to counter the thesis that "bigger is better" when it comes to LLMs: small ( or smaller) language models (SLMs).

The SLM thesis centers around the viability of smaller, highly specialized, more affordable models for specific use cases. This movement has partly been catalyzed by the rise of open-source generative AI models. When theorizing about the future of open source vs. closed source models, there are two main universes to explore:

Open source LLMs matching or surpassing the performance of closed source ones. Example: Llama 3 surpasses GPT-5.
Open source LLMs becoming the foundation for fine-tuned models or agents in highly specialized scenarios.

SLMs are the first manifestation of the second theory. Most companies can sacrifice a bit of the quality of models like GPT-4 or Claude in order to gain more control over the fine-tuning and optimization of LLMs and also optimize costs. Microsoft and Meta have emerged as champions of the SLM movement. In the last two weeks, the Redmond giant announced the release of Phi-2, an SLM highly specialized in mathematical reasoning, which is the second iteration of the ideas outlined in the "Textbooks are all You Need" paper. Microsoft also announced Orca2, an SLM hyper-optimized for reasoning tasks such as common sense reasoning, math problem solving, reading comprehension, and several others.

SLMs are likely to become a force to be reckoned with in generative AI. As LLMs keep pushing the scaling laws and become bigger and bigger, we should ask ourselves: how small is really small for an SLM?

🔎 ML Research

Orca 2

Microsoft Research published a paper detailing Orca 2, the second version of a small language model that exhibit stronger reasoning capabilities that much larger alternatives. The model is created by fine-tuning Llama 2 with a sophisticated synthetic reasoning dataset —> Read more.

Transformers and Composability

Researchers from the Allen Institute for Artificial Intelligence published a paper exploring the limits of transformer models in compositional problems. The paper explores tasks such as multiplication, logic grid puzzles, and a classic dynamic programming problem that have traditionally resulted challenging for transformers —> Read more.

LLM Editing

Microsoft Research published a paper exploring three fundamental types of LLM editing techniques. These methods target small modifications in LLMs that can optimize the behavior of models without changing their fundamental architecture —> Read more.

ChatAnything

Researchers from Bytedance and Nankai University published a paper detailing ChatAnything, a model to generate anthropomorphized personas for LLM-based characters. The model incorporates in-context learning capabilities for features such as personality, tone and visual appearence —> Read more.

Lookahead Decoding

LMSys published the research behind lookahead decoding, a parallel decoding algorithm that can accelerate LLM inference. The method is already implemented in tne Hugging Face’s Transformers library and leads to significant performance improvements in token generation —> Read more.

🤖 Cool AI Tech Releases

Claude 2.1

Anthropic released a new version of Claude with an astonishing 200k token window —> Read more.

Stable Video

Stability AI open source Stable Video, a generative video model based on Stable Diffusion —> Read more.

Phi-2

Microsoft Phi-2 model for mathematical reasoning is now available —> Read more.

🛠 Real World ML

Python at Meta

Meta discusses some insights about the architecture and best practices supporting high scale Python workloads —> Read more.

📡AI Radar

The OpenAI drama dominated the headlines this week with the happy conclusion of Sam Altman’s return as CEO and the formation of a new board.
AI21 Labs completed a $208 million series C with an addition of $53 million.
NVIDIA delivered strong Q3 results.
Adobe is acquiring generative AI startup Rephrase.ai.
Rockset added vector search capabilities to its database engine.
French startup Osium AI raised $2.6 million for applying AI to material sciences.
AI-ecommerce startup Birdeye announced a $3 million seed round.
Self-driving vehicle guru Anthony Levandowski rebooted his famous Churd of AI.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Like

Comment

Restack

Welcome to the World of Small(er) Language Models

Welcome to the World of Small(er) Language Models

Smaller, highly specialized and cost-effective LLMs are a trend to track in generative AI.

Next Week in The Sequence:

You can subscribe below:

📝 Editorial: Welcome to the World of Small(er) Language Models

🔎 ML Research

Orca 2

Transformers and Composability

LLM Editing

ChatAnything

Lookahead Decoding

🤖 Cool AI Tech Releases

Claude 2.1

Stable Video

Phi-2

🛠 Real World ML

Python at Meta

📡AI Radar

Older messages

Inside LlaVA: The Very Popular Open Source Alternative to GPT-4V

The Sequence Chat: Doug Burger- Technical Fellow, Microsoft Research About Building Autonomous Agents, AutoGen and…

Edge 345: Deep Diving Into Reinforcement Learning with Human Feedback

📝 Guest Post: Creating your first Data Labeling Agent*

Thank you for supporting TheSequence

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR