The Sequence Opinion #499: Reinforcement Learning was Dying and then Gen AI Came Along
Was this email forwarded to you? Sign up here The Sequence Opinion #499: Reinforcement Learning was Dying and then Gen AI Came AlongSome perspectives about how foundation models inspired a new era in reinforcement learning.I know, I know, the title is pretty controversial but hopefully caught your attention ;) These days we are hearing more about Reinforcement Learning (RL) in the world of generative AI. To some extent, foundation models have served almost as a forcing function in the renaissance of RL which, as an AI method, experienced quite a bit of challenges over the last few years. RL has long been heralded as a general framework for achieving artificial intelligence, promising agents that learn optimal behavior through trial and error. In 2016, DeepMind’s AlphaGo victory over a world champion in the complex board game Go stunned the world and raised expectations sky-high. AlphaGo’s success suggested that deep RL techniques, combined with powerful neural networks, could crack problems once thought unattainable. Indeed, in the aftermath of this breakthrough, many viewed RL as a potential path to artificial general intelligence (AGI), fueling tremendous hype and investment Yet reality proved more sobering: after AlphaGo, RL’s impact beyond controlled settings remained limited, and progress toward broader AI applications stalled. In recent years, however, RL has experienced a revival – not by conquering new board games, but by becoming an integral part of foundation models development. Foundation models like large language models (e.g. GPT-3, GPT-4) are pretrained on massive datasets via self-supervised learning. While these models acquire vast knowledge and linguistic capability, they initially lack alignment with human preferences and often struggle with complex reasoning or reliability. RL has reemerged as a powerful tool to fine-tune these foundation models after pre-training, aligning them with human intentions and even improving their problem-solving skills. In this essay, we explore how the RL field went from the heights of AlphaGo’s triumph, through a period of tempered expectations, to a renaissance as a critical component in the age of foundation models. We examine the high hopes and subsequent challenges post-AlphaGo, the incorporation of RL in fine-tuning large models (often via reinforcement learning from human feedback), the example of DeepSeek R1 as a landmark in this evolution, and the broader implications of this trend for AI development and deployment. From AlphaGo to Reality: High Hopes and Limited Success...Subscribe to TheSequence to unlock the rest.Become a paying subscriber of TheSequence to get access to this post and other subscriber-only content. A subscription gets you:
|
Older messages
The Sequence Knowledge #492: RAG-Fusion is Better than Just RAG
Thursday, February 27, 2025
Understanding the principles of RAG-fusion techniques. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Engineering #493: One of the Best Agent Frameworks in the Market Just Got Way Better
Thursday, February 27, 2025
The new version adds a considerable set of capabilities for a more integrated agent development experience. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Opinion #394: Models that Learn All the Time? Some Cutting Edge Ideas about Continual Learning
Thursday, February 27, 2025
Modularity, sparcity, MoEs and other ideas that can unlock continual learning. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Research #495: Microsoft's Framework for Building Large Action Models
Thursday, February 27, 2025
An architecture reference and framework for building models that can execute actions. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Radar #496: Microsoft Muse Can Generate Entire Games After Watching You Play
Thursday, February 27, 2025
The new AI model represents a milestone in gameplay idetation. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
SRE Weekly Issue #464
Thursday, February 27, 2025
View on sreweekly.com A message from our sponsor, incident.io: For years, on-call has felt more like a burden than a solution. But modern teams are making a change. On Feb 26 at 1 PM EST, hear why—and
Hands On: New VS Code Insiders Build Creates Web Page from Image in Seconds, More
Thursday, February 27, 2025
Home | News | How To | Webcasts | Whitepapers | Advertise .NET Insight February 27, 2025 THIS ISSUE SPONSORED BY: ■ Visual Studio Live! Las Vegas: .NET Developer Training Conference ■ VSLive! 4-Day
Re: Tomorrow's Password Class: How to sign up!
Thursday, February 27, 2025
Hi there, Do you reuse passwords? Do you struggle to remember unique passwords across accounts? Have you tried setting up a password manager but found it to be a hassle? You might not realize how
Documenting Event-Driven Architecture with EventCatalog and David Boyne
Thursday, February 27, 2025
If you're wondering on how to document Event-Driven Architecture, or you don't know that you should, I have something for you. We discussed with David Boyne, why data governance practices and
wpmail.me issue#708
Thursday, February 27, 2025
wpMail.me wpmail.me issue#708 - The weekly WordPress newsletter. No spam, no nonsense. - February 27, 2025 Is this email not displaying correctly? View it in your browser. News & Articles Shaping
Hackers stole 1Password logins - here's how
Thursday, February 27, 2025
Amazon AI races ahead; Research agents; Smartwatch trade-in -- ZDNET ZDNET Tech Today - US February 27, 2025 thief stealing passwords Hackers stole this engineer's 1Password database. Could it
New Golang-Based Backdoor Uses Telegram Bot API for Evasive C2 Operations
Thursday, February 27, 2025
THN Daily Updates Newsletter cover ⚡ LIVE WEBINAR ➟ Building Resilient Identity: Reducing Security Debt in 2025 Attacks Evolve, So Can Your Defenses--Learn How to Mitigate Risk and Optimize Identity
Reminder: What developer productivity metrics actually measure
Thursday, February 27, 2025
You are receiving this email because you subscribed to microservices.io. Considering migrating a monolith to microservices? Struggling with the microservice architecture? I can help: architecture
⚡ THN Weekly Recap: Google Secrets Stolen, Windows Hack, New Crypto Scams & More
Thursday, February 27, 2025
From Google espionage to crypto scams, this week's Cyber Recap uncovers it all—read more now ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Guest-post: Open-source Python Development Landscape
Thursday, February 27, 2025
30 must-know tools for Python development ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏