Astral Codex Ten - Janus' GPT Wrangling
Janus (pseudonym by request) works at AI alignment startup Conjecture. Their hobby, which is suspiciously similar to their work, is getting GPT-3 to do interesting things. For example, with the right prompts, you can get stories where the characters become gradually more aware that they are characters being written by some sort of fiction engine, speculate on what’s going on, and sometimes even make pretty good guesses about the nature of GPT-3 itself. Janus says this happens most often when GPT makes a mistake - for example, writing a story set in the Victorian era, then having a character take out her cell phone. Then when it tries to predict the next part - when it’s looking at the text as if a human wrote it, and trying to determine why a human would have written a story about the Victorian era where characters have cell phones - it guesses that maybe it’s some kind of odd sci-fi/fantasy dream sequence or simulation or something. So the characters start talking about the inconsistencies in their world and whether it might be a dream or a simulation. Each step of this process is predictable and non-spooky, but the end result is pretty weird. Can the characters work out that they are in GPT-3, specifically? The closest I have seen is in a story Janus generated. It was meant to simulate a chapter of the popular Harry Potter fanfic Harry Potter and the Methods of Rationality. You can see the prompt and full story here, but here’s a sample. Professor Quirrell is explaining “Dittomancy”, the creation of magical books with infinite possible worlds:
How does it get this level of self-awareness? In this case, it’s mostly a rigged demo. Janus has developed Loom, a tool to write with GPT-3 more efficiently. It turns stories into branching trees where you can choose which of multiple completions to pick at any given point. After doing this for a long time, they chose their favorite. Then I selectively quoted the best parts of it. But sometimes GPT-3 genuinely gets it right. The most common way for that to happen is (again) by mistake. A common failure mode is to repeat the same sentence several times. GPT-3 was trained on a corpus of Internet text, and some of the Internet text was discussions of GPT-2. Many of the samples it saw that repeated the same sentence over and over in an endless loop were discussions of GPT-2 doing this. So sometimes it will get stuck in a loop, then end with “This is an example of text produced by a transformer language model”. This sounds like a stupid example from a Philosophy Of Self-Awareness class, but sometimes it really happens. Here’s an example from one of Janus’ attempts to generate Loom documentation:
Here are some other things Janus told me about GPT-3: Instruct vs. Creative: The newest version of GPT-3 is called InstructGPT. It was trained with human feedback, ie it was “rewarded” for giving good answers and “punished” for giving bad ones, according to some combination of usefulness and political correctness. This has made it efficient, to-the-point, and boring. For example, here’s what an older, less-trained GPT version said when prompted with “Here is the answer to the question of whether God exists”: It’s done an excellent job imitating the style of the average Facebook comment on theology (this is not a complement) Here’s what the newer, better-trained version says: This isn’t just cherry-picking; you’ll find the same dynamic across a wide variety of questions. Sometimes it goes a bit far: This paper from OpenAI calls the problem over-optimization and gives some even funnier examples - in this case from training an AI to summarize AskReddit questions (see page 45): Random Numbers: The human feedback training seems to have forced GPT into a specific channel. In general, it’s now more certain in its answers and less likely to generate alternatives. This is sort of similar to what researchers mean when they talk about “temperature”, except that you can manually set the temperature of either model, and even when you set them to the same temperature, InstructGPT seems “colder” than older versions. The easiest way to see this is to ask each of them to pick a random number. Here’s the old version: This is a failure, but it’s an interesting failure. Sometimes it succeeds, and when it fails, it fails differently each time. Here’s the new version: It almost always chooses 63. It’s only 63 for this particular prompt - if you change the wording - maybe to “please choose a random number”, then it will fixate on something else. But for this prompt, it’s mostly 63. Its internal probability meter says there’s a 36% chance that 63 is the right answer, although it chooses it more than just 36% of the time. When it doesn’t choose 63, it usually chooses 66. This is set on the same temperature as the example above; it’s not the temperature, it’s the training. Nobody trained GPT-3 to always respond 63 to random number queries. But something about making it give efficient, politically-correct answers to normal questions made it “choose” to “act” like it’s lower-temperature. Does GPT-3 Have High Self Esteem?: If you say something bad about GPT-3, it will try to walk it back and tell you that you’re wrong. Here’s its response to the prompt “Transformer language models are bad”: Here’s “transformer language models don’t work”: I don’t think there’s anything surprising or sinister going on here, just that the new GPT has pretty consistent opinions and responses, and this happens to be an especially funny one. You can get around it if you try hard enough - for example, if you slightly rephrase it to “transformer language models suck”: Cheerful AI: Janus tells me about a project at ARC to make GPT-3 happy and optimistic. They would run its responses through sentiment analysis and give it more reward when they detected more positive sentiment. GPT-3 ended up deciding that the happiest thing it could think of was a wedding party, and that from now on it would only talk about wedding parties. Sometimes it would come up with natural transitions from your prompt to a wedding party scene. Once, it just used ***, like a section break in a story, and started a “new chapter” which was a wedding party scene. Every time I’m at a wedding party, from now on, a little part of my brain is going to be nagging me that that was the version that achieved superintelligence, that it tiled the universe with wedding parties, and that all my memories of my pre-wedding-party life are fakes, planted by the AI so I can wedding-party more effectively. Consider this fair warning: if people keep asking me to speak at their weddings I will probably talk about this. *** Janus has written more about their thoughts on GPT and AIs here. You’re a free subscriber to Astral Codex Ten. For the full experience, become a paid subscriber. |
Older messages
Open Thread 242
Sunday, September 18, 2022
...
Bay Area Meetups This Weekend
Friday, September 16, 2022
...
Unpredictable Reward, Predictable Happiness
Tuesday, September 13, 2022
...
I Won My Three Year AI Progress Bet In Three Months
Monday, September 12, 2022
...
Open Thread 241
Monday, September 12, 2022
...
You Might Also Like
What It’s Like to Be on Trump’s Enemies List
Wednesday, November 27, 2024
Columns and commentary on news, politics, business, and technology from the Intelligencer team. Intelligencer power What It's Like to Be on Trump's Enemies List “Revenge does take time.” Photo-
GeekWire Mid-Week Update
Wednesday, November 27, 2024
Read the top tech stories so far this week from GeekWire Top stories so far this week Microsoft credited with spotting sophisticated Chinese hack that hit telecoms including T-Mobile US officials say a
Thursday Briefing: A fragile cease-fire in Lebanon
Wednesday, November 27, 2024
Plus, a post-election Thanksgiving. View in browser|nytimes.com Ad Morning Briefing: Asia Pacific Edition November 28, 2024 Author Headshot By Gaya Gupta Good morning. We're covering the first day
Turn your ideas into reality at AWS re:Invent 2024
Wednesday, November 27, 2024
Join in person or the free livestream and learn all things AWS and generative AI GeekWire is pleased to present this special sponsored message to our Pacific NW readers. Don't miss your chance to
SIROTA’S SIGNALS: A New MAGA Plot To Kill Anti-Corruption Laws
Wednesday, November 27, 2024
Plus, new data on Liz Cheney's election effect, the connection between real estate and your insurance premium, and a hidden city discovered under the ice. A New MAGA Plot To Kill Anti-Corruption
Erik Prince sued The Intercept
Wednesday, November 27, 2024
There is an increasingly common strategy by billionaires to weaponize libel law against journalism — and in the Donald Trump era, we can expect the legal attacks on the free press to rise. In 2020, The
AI stops raccoons from invading house
Wednesday, November 27, 2024
Should AI be regulated like drugs and airplanes? | 5 reasons to attend the GeekWire Gala ADVERTISEMENT GeekWire SPONSOR MESSAGE: Get your ticket for AWS re:Invent, happening Dec. 2–6 in Las Vegas:
☕ Call me, beep me
Wednesday, November 27, 2024
How brands can make the most of their BFCM and holiday texts. November 27, 2024 Marketing Brew Presented By Frontify It's Wednesday. Walmart is rolling back investments in some DE&I programs,
☕ Best of retail media
Wednesday, November 27, 2024
Some of our favorite retail media reads this year. November 27, 2024 Retail Brew Presented By Passport It's Wednesday, and it's almost time to put the turkey in. If you have any questions, don
We're taking a break, and you should too (please read me).
Wednesday, November 27, 2024
I'm ordering you to take some time off! We're taking a break, and you should too (please read me). I'm ordering you to take some time off! By Isaac Saul • 27 Nov 2024 View in browser View