|
This Week in Turing Post: |
Wednesday, AI 101, Concepts: Get ready for the next set of ML Flashcards Friday, Agentic Workflows series.
|
If you like Turing Post, consider becoming a paid subscriber or sharing this digest with a friend. It helps us keep Monday digests free → | |
|
|
The main topic – When plateau is actually a fork |
Last week was a whirlpool of discussions around scaling laws. The recent performance of OpenAI's "Orion," showing only modest improvements over GPT-4, and rumors of Google’s Gemini falling short of expectations, have sparked conversations about an AI plateau. Marc Andreessen noted that multiple models are "hitting the same ceiling on capabilities," while Ilya Sutskever reflected that "the 2010s were the age of scaling; now we’re back in the age of wonder and discovery." That caused a lot of media and analysts to talk about generative AI reaching a plateau. |
Let’s be nerdy and look into the meaning of the word “plateau.” In science, a plateau phase refers to a steady state in a process. In psychology, a plateau can describe a stage where growth or learning appears stagnant, requiring new strategies or approaches to break through. |
With generative AI, we are both on a plateau and not on a plateau since we are not in a steady state. So what we need is new strategies and approaches to break through. And there are already many of them either existing or emerging. |
Today, I want to highlight a few important approaches that might be relevant to a breakthrough. |
What is Compound AI? |
Compound AI systems offer a practical way to address scaling law limitations. Instead of relying solely on larger models, these systems improve efficiency and performance by optimizing resource use and tailoring components to specific tasks. The first instances of "Compound AI" principles – combining multiple models, systems, or tools to solve complex tasks – date back to early research in multi-agent systems and ensemble learning, long before the term "Compound AI" was popularized. These ideas evolved from: |
1990s: Ensemble learning (e.g., random forests) and multi-agent systems introduced collaborative and model-combining techniques. 2010s: Pipeline systems like IBM Watson combined NLP and retrieval models for complex tasks. 2020s: Tool-integrated models like Codex and AlphaCode refined these ideas with external tools and iterative approaches.
|
Recently, in February 2024, BAIR formally spotlighted Compound AI in their famous paper “The Shift from Models to Compound AI Systems”, framing it as a system-level paradigm for efficiency and scalability. I remembered about it seeing the today’s news about the F1 and F1-mini, compound AI models excelling in complex reasoning. Early testing indicates that F1 matches or surpasses many closed frontier models in areas such as coding, mathematics, and logic puzzles. Promising, indeed. |
Next, What Are We Scaling? |
One of the goals of scaling laws is to identify where additional resources yield the greatest improvements. Remember how everybody talked about test time compute when OpenAI’s o1 just launched? They demonstrated that allowing models to "think longer" during inference significantly improved their reasoning performance across complex tasks, such as achieving human-expert accuracy on PhD-level science questions and competitive programming challenges. It’s because test time compute provides an efficient way to boost performance without significantly increasing model size or data volume, strategically addressing the cost-performance trade-offs, pushing the boundaries of what existing models can achieve. They covered it in detail in the paper “Learning to Reason with LLMs.” |
Two more important papers about test-time compute are worth checking out if you want to dive deeper: |
|
Instead of concentrating all resources on the training phase, it’s time to optimize and scale inference. |
As for the Steady State |
We need that as well. Not exponentially growing models but actual utility. Systems need not only reasoning but also the ability to act on their reasoning, leveraging external tools or workflows. |
So here’s the fork in the road: |
On one side, new approaches to scaling – like test-time compute – show us where additional resources can unlock meaningful gains. On the other side, the age of scaling is giving way to an age of integration, where reasoning meets action through systems that leverage external tools and workflows. |
Far from a plateau, this is a transition. We’re moving into uncharted territory, where breakthroughs will come not from growing models indefinitely but from building systems that are smarter, more efficient, and deeply integrated. Sutskever’s right: we’ve stepped out of the shadow of pure scaling and back into the age of wonder and discovery. |
(speaking of integration, here is a very fresh paper “The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use”. Researchers from the National University of Singapore put Claude 3.5 to the test as a GUI automation agent, tackling 20 real-world desktop tasks – from web navigation to gaming. It nailed planning, GUI actions, and adapting dynamically, handling tasks like adding ANC headphones under $100 to an Amazon cart. |
|
|
| 8 Free Sources to Master Building AI Agents | and learn how to build AI agents of different types. Most of them are also suitable for beginners. Invaluable collection! | www.turingpost.com/p/building-ai-agents-sources |
| |
|
|
Weekly recommendation from AI practitioner👍🏼 |
LLMs work better with markdown, and so far https://github.com/JohannesKaufmann/html-to-markdown does the best job of converting an entire HMTL page to markdown rather than just the top few paragraphs which is the norm for other tools. |
|
|
Top Research |
|
| Image Credit: The Original paper |
|
That’s a super fun paper: Game-theoretic LLM: Agent Workflow for Negotiation Games Researchers investigated whether AI language models could negotiate and play strategic games like humans. They found that while these AIs can be incredibly sophisticated negotiators, they sometimes act irrationally - not because they're flawed, but because they're too trusting! When two AIs negotiate, they tend to prioritize cooperation over self-interest, unlike humans who are typically more strategic →read the paper Stronger Models Are Not Stronger Teachers for Instruction Tuning Researchers from the University of Washington and Allen Institute for AI examined if larger models improve smaller models during instruction tuning. They introduced the "Larger Models’ Paradox," finding larger models are not always better teachers than medium-sized ones →read the paper Generative Agent Simulations of 1,000 People The study built AI agents to mimic the behaviors of 1,052 people based on interviews and surveys, hitting 85% accuracy in replicating responses. These agents can predict personality traits and social experiment outcomes, reducing bias compared to simpler models. With applications in policymaking and research, this project offers a safe way for scientists to explore human-like simulations while keeping participant data secure →read the paper Toward Modular Models: Collaborative AI Development Enables Model Accountability and Continuous Learning Researchers from Microsoft propose modular AI models to address monolithic architecture limitations, enabling flexibility, transparency, and efficiency. They emphasize "MoErging," a taxonomy for routing tasks using expert models categorized by design (classifier-based, embedding-based, task-specific, or nonrouter). Benefits include privacy-compliant contributions, improved extensibility, accountability, and reduced compute costs →read the paper
|
You can find the rest of the curated research at the end of the newsletter. |
|
We are reading |
|
|
News from The Usual Suspects © |
|
|
More interesting research papers from last week |
Advanced Language Models |
|
Model Optimization & Alignment |
|
Multimodal & Vision-Language Models |
|
Hardware & Efficiency |
|
Counterfactuals & Reasoning |
|
Network Automation & Specialized Models |
|
Narrative & Media Processing |
|
Leave a review! |
|
Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription! |
|
|
|