Was this email forwarded to you? Sign up here

Edge 266: The Magic Behind ChatGPT: Reinforcement Learning with Human Feedback

One of the techniques that enable the ChatGPT breakthrough comes from a 2017 research paper.

Feb 2

A few days ago, the data science community engaged in an intense debate when AI legend and Chief AI Scientist at Meta, Yann LeCun made some remarks about the fact that ChatGPT was not particularly innovative. Although controversial in light of the almost magical capabilities of ChatGPT, the statement is rooted in the fact that many of the ideas behind ChatGPT have been around for a while, and ChatGPT has been more the result of clever implementation than breakthrough research. One of the key enablers of the ChatGPT magic can be traced back to 2017 under the obscure name of reinforcement learning with human feedback(RLHF).

Large language models(LLMs) have become one of the most interesting environments for applying modern reinforcement learning(RL) techniques. While LLMs are great at deriving knowledge from vast amounts of text, RL can help to translate that knowledge into actions. That has been the secret behind RLHF...

Subscribe to TheSequence to read the rest.

Become a paying subscriber of TheSequence to get access to this post and other subscriber-only content.

A subscription gets you:

	Full access to TheSequence Edge – what's new in AI + the most relevant ML concepts, research papers, tech solutions
	Full archive
	Comments and discussions

Like

Comment

Share

Edge 266: The Magic Behind ChatGPT: Reinforcement Learning with Human Feedback

Edge 266: The Magic Behind ChatGPT: Reinforcement Learning with Human Feedback

One of the techniques that enable the ChatGPT breakthrough comes from a 2017 research paper.

Subscribe to TheSequence to read the rest.

A subscription gets you:

Older messages

📍 Free Guide: Maximize the ROI of your AI/ML Investment: Building vs. Buying Monitoring Solutions*

Edge 265: Interpretability Methods for Deep Neural Networks

Has OpenAI Hit Escape Velocity?

Edge 264: Inside Muse: Google’s New Text-to-Image Super Model

Edge 263: Local Model-Agnostic Interpretability Methods: Counterfactual Explanations

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR