Edge 266: The Magic Behind ChatGPT: Reinforcement Learning with Human Feedback
Was this email forwarded to you? Sign up here Edge 266: The Magic Behind ChatGPT: Reinforcement Learning with Human FeedbackOne of the techniques that enable the ChatGPT breakthrough comes from a 2017 research paper.A few days ago, the data science community engaged in an intense debate when AI legend and Chief AI Scientist at Meta, Yann LeCun made some remarks about the fact that ChatGPT was not particularly innovative. Although controversial in light of the almost magical capabilities of ChatGPT, the statement is rooted in the fact that many of the ideas behind ChatGPT have been around for a while, and ChatGPT has been more the result of clever implementation than breakthrough research. One of the key enablers of the ChatGPT magic can be traced back to 2017 under the obscure name of reinforcement learning with human feedback(RLHF). Large language models(LLMs) have become one of the most interesting environments for applying modern reinforcement learning(RL) techniques. While LLMs are great at deriving knowledge from vast amounts of text, RL can help to translate that knowledge into actions. That has been the secret behind RLHF... Subscribe to TheSequence to read the rest.Become a paying subscriber of TheSequence to get access to this post and other subscriber-only content. A subscription gets you:
|
Older messages
📍 Free Guide: Maximize the ROI of your AI/ML Investment: Building vs. Buying Monitoring Solutions*
Wednesday, February 1, 2023
There is no one-size-fits-all solution for ensuring model performance and accuracy
Edge 265: Interpretability Methods for Deep Neural Networks
Tuesday, January 31, 2023
Interpretability methods optimized for deep neural networks, OpenAI's interpretability technique to discover multimodal neurons on CLIP and the Eli5 framework.
Has OpenAI Hit Escape Velocity?
Sunday, January 29, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
Edge 264: Inside Muse: Google’s New Text-to-Image Super Model
Thursday, January 26, 2023
The new generative AI model shows significant efficiency improvements over models like Stable Diffusion, Imagen and Parti.
Edge 263: Local Model-Agnostic Interpretability Methods: Counterfactual Explanations
Tuesday, January 24, 2023
Counterfactual explanations as an ML interpretability method, Google's StylEx and Microsoft's DiCE implementation
You Might Also Like
Corporate Casserole 🥘
Monday, November 25, 2024
How marketing and lobbying inspired Thanksgiving traditions. Here's a version for your browser. Hunting for the end of the long tail • November 24, 2024 Hey all, Ernie here with a classic
WP Weekly 221 - Bluesky - WP Assets on CDN, Limit Font Subsets, ACF Pro Now
Monday, November 25, 2024
Read on Website WP Weekly 221 / Bluesky Have you joined Bluesky, like many other WordPress users, a new place for an online social presence? Also in this issue: CrawlWP, Asset Management Framework,
🤳🏻 We Need More High-End Small Phones — Linux Terminal Setup Tips
Sunday, November 24, 2024
Also: Why I Switched From Google Maps to Apple Maps, and More! How-To Geek Logo November 24, 2024 Did You Know Medieval moats didn't just protect castles from invaders approaching over land, but
JSK Daily for Nov 24, 2024
Sunday, November 24, 2024
JSK Daily for Nov 24, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
OpenAI's turbulent early years - Sync #494
Sunday, November 24, 2024
Plus: Anthropic and xAI raise billions of dollars; can a fluffy robot replace a living pet; Chinese reasoning model DeepSeek R1; robot-dog runs full marathon; a $12000 surgery to change eye colour ͏ ͏
Daily Coding Problem: Problem #1618 [Easy]
Sunday, November 24, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Zillow. Let's define a "sevenish" number to be one which is either a power
PD#602 How Netflix Built Self-Healing System to Survive Concurrency Bug
Sunday, November 24, 2024
CPUs were dying, the bug was temporarily un-fixable, and they had no viable path forward
RD#602 What are React Portals?
Sunday, November 24, 2024
A powerful feature that allows rendering components outside their parent component's DOM hierarchy
C#533 What's new in C# 13
Sunday, November 24, 2024
Params collections support, a new Lock type and others
⚙️ Smaller but deeper: Writer’s secret weapon to better AI
Sunday, November 24, 2024
November 24, 2024 | Read Online Ian Krietzberg Good morning. I sat down recently with Waseem Alshikh, the co-founder and CTO of enterprise AI firm Writer. Writer recently made waves with the release of