Astral Codex Ten - Janus' GPT Wrangling
Janus (pseudonym by request) works at AI alignment startup Conjecture. Their hobby, which is suspiciously similar to their work, is getting GPT-3 to do interesting things. For example, with the right prompts, you can get stories where the characters become gradually more aware that they are characters being written by some sort of fiction engine, speculate on what’s going on, and sometimes even make pretty good guesses about the nature of GPT-3 itself. Janus says this happens most often when GPT makes a mistake - for example, writing a story set in the Victorian era, then having a character take out her cell phone. Then when it tries to predict the next part - when it’s looking at the text as if a human wrote it, and trying to determine why a human would have written a story about the Victorian era where characters have cell phones - it guesses that maybe it’s some kind of odd sci-fi/fantasy dream sequence or simulation or something. So the characters start talking about the inconsistencies in their world and whether it might be a dream or a simulation. Each step of this process is predictable and non-spooky, but the end result is pretty weird. Can the characters work out that they are in GPT-3, specifically? The closest I have seen is in a story Janus generated. It was meant to simulate a chapter of the popular Harry Potter fanfic Harry Potter and the Methods of Rationality. You can see the prompt and full story here, but here’s a sample. Professor Quirrell is explaining “Dittomancy”, the creation of magical books with infinite possible worlds:
How does it get this level of self-awareness? In this case, it’s mostly a rigged demo. Janus has developed Loom, a tool to write with GPT-3 more efficiently. It turns stories into branching trees where you can choose which of multiple completions to pick at any given point. After doing this for a long time, they chose their favorite. Then I selectively quoted the best parts of it. But sometimes GPT-3 genuinely gets it right. The most common way for that to happen is (again) by mistake. A common failure mode is to repeat the same sentence several times. GPT-3 was trained on a corpus of Internet text, and some of the Internet text was discussions of GPT-2. Many of the samples it saw that repeated the same sentence over and over in an endless loop were discussions of GPT-2 doing this. So sometimes it will get stuck in a loop, then end with “This is an example of text produced by a transformer language model”. This sounds like a stupid example from a Philosophy Of Self-Awareness class, but sometimes it really happens. Here’s an example from one of Janus’ attempts to generate Loom documentation:
Here are some other things Janus told me about GPT-3: Instruct vs. Creative: The newest version of GPT-3 is called InstructGPT. It was trained with human feedback, ie it was “rewarded” for giving good answers and “punished” for giving bad ones, according to some combination of usefulness and political correctness. This has made it efficient, to-the-point, and boring. For example, here’s what an older, less-trained GPT version said when prompted with “Here is the answer to the question of whether God exists”: It’s done an excellent job imitating the style of the average Facebook comment on theology (this is not a complement) Here’s what the newer, better-trained version says: This isn’t just cherry-picking; you’ll find the same dynamic across a wide variety of questions. Sometimes it goes a bit far: This paper from OpenAI calls the problem over-optimization and gives some even funnier examples - in this case from training an AI to summarize AskReddit questions (see page 45): Random Numbers: The human feedback training seems to have forced GPT into a specific channel. In general, it’s now more certain in its answers and less likely to generate alternatives. This is sort of similar to what researchers mean when they talk about “temperature”, except that you can manually set the temperature of either model, and even when you set them to the same temperature, InstructGPT seems “colder” than older versions. The easiest way to see this is to ask each of them to pick a random number. Here’s the old version: This is a failure, but it’s an interesting failure. Sometimes it succeeds, and when it fails, it fails differently each time. Here’s the new version: It almost always chooses 63. It’s only 63 for this particular prompt - if you change the wording - maybe to “please choose a random number”, then it will fixate on something else. But for this prompt, it’s mostly 63. Its internal probability meter says there’s a 36% chance that 63 is the right answer, although it chooses it more than just 36% of the time. When it doesn’t choose 63, it usually chooses 66. This is set on the same temperature as the example above; it’s not the temperature, it’s the training. Nobody trained GPT-3 to always respond 63 to random number queries. But something about making it give efficient, politically-correct answers to normal questions made it “choose” to “act” like it’s lower-temperature. Does GPT-3 Have High Self Esteem?: If you say something bad about GPT-3, it will try to walk it back and tell you that you’re wrong. Here’s its response to the prompt “Transformer language models are bad”: Here’s “transformer language models don’t work”: I don’t think there’s anything surprising or sinister going on here, just that the new GPT has pretty consistent opinions and responses, and this happens to be an especially funny one. You can get around it if you try hard enough - for example, if you slightly rephrase it to “transformer language models suck”: Cheerful AI: Janus tells me about a project at ARC to make GPT-3 happy and optimistic. They would run its responses through sentiment analysis and give it more reward when they detected more positive sentiment. GPT-3 ended up deciding that the happiest thing it could think of was a wedding party, and that from now on it would only talk about wedding parties. Sometimes it would come up with natural transitions from your prompt to a wedding party scene. Once, it just used ***, like a section break in a story, and started a “new chapter” which was a wedding party scene. Every time I’m at a wedding party, from now on, a little part of my brain is going to be nagging me that that was the version that achieved superintelligence, that it tiled the universe with wedding parties, and that all my memories of my pre-wedding-party life are fakes, planted by the AI so I can wedding-party more effectively. Consider this fair warning: if people keep asking me to speak at their weddings I will probably talk about this. *** Janus has written more about their thoughts on GPT and AIs here. You’re a free subscriber to Astral Codex Ten. For the full experience, become a paid subscriber. |
Key phrases
Older messages
Open Thread 242
Sunday, September 18, 2022
...
Bay Area Meetups This Weekend
Friday, September 16, 2022
...
Unpredictable Reward, Predictable Happiness
Tuesday, September 13, 2022
...
I Won My Three Year AI Progress Bet In Three Months
Monday, September 12, 2022
...
Open Thread 241
Monday, September 12, 2022
...
You Might Also Like
If Britain is so bothered by China, why do these .gov.uk sites use Chinese ad brokers? [Thu Apr 25 2024]
Thursday, April 25, 2024
Hi The Register Subscriber | Log in The Register {* Daily Headlines *} 25 April 2024 A map of the UK If Britain is so bothered by China, why do these .gov.uk sites use Chinese ad brokers? One wonders
Abort the Court
Thursday, April 25, 2024
SCOTUS heard arguments in what could be the most consequential post-Dobbs abortion case. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
How to build a “Bloomberg for X” media company
Thursday, April 25, 2024
Many companies that attempted to monetize media outlets with non-media tech products have stumbled. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Prefer to Skip Mother’s Day Emails?
Thursday, April 25, 2024
Tap a button, and you won't hear from us. The Strategist Prefer to Skip Mother's Day Emails? Mother's Day is coming up, and we understand that this can be a difficult time for some. So if
GeekWire Mid-Week Update
Wednesday, April 24, 2024
Read the top tech stories so far this week from GeekWire GeekWire Mid-Week Update Top stories so far this week After lobbying by Uber and DoorDash, new proposal would overhaul Seattle's minimum
Student Protesters Are Schooling Their Universities
Wednesday, April 24, 2024
Columns and commentary on news, politics, business, and technology from the Intelligencer team. Intelligencer student protests Student Protesters Are Schooling Their Universities Pro-Palestinian
The magic of white noise
Wednesday, April 24, 2024
Sweet dreams ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Thursday Briefing: Israel seems poised to invade Rafah
Wednesday, April 24, 2024
Also, details of the US aid package to Ukraine and Taylor Swift's new album View in browser|nytimes.com Continue reading the main story Ad Morning Briefing: Asia Pacific Edition April 25, 2024
Feeling stressed? This doctor’s got a book on it.
Wednesday, April 24, 2024
You're invited to The Conversation's book club in May
ByteDance with Death
Wednesday, April 24, 2024
TikTok's Tick Tock, Calorie Restriction ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏