Astral Codex Ten - OpenAI's "Planning For AGI And Beyond"
Planning For AGI And BeyondImagine ExxonMobil releases a statement on climate change. It’s a great statement! They talk about how preventing climate change is their core value. They say that they’ve talked to all the world’s top environmental activists at length, listened to what they had to say, and plan to follow exactly the path they recommend. So (they promise) in the future, when climate change starts to be a real threat, they’ll do everything environmentalists want, in the most careful and responsible way possible. They even put in firm commitments that people can hold them to. An environmentalist, reading this statement, might have thoughts like:
This is how I feel about OpenAI’s new statement, Planning For AGI And Beyond. OpenAI is the AI company behind ChatGPT and DALL-E. In the past, people (including me) have attacked them for seeming to deprioritize safety. Their CEO, Sam Altman, insists it is definitely a priority, and has recently been sending various signals to that effect. Planning For AGI And Beyond (“AGI” = “artificial general intelligence”, ie human-level AI) is the latest volley in that campaign. It’s very good, in all the ways ExxonMobil’s hypothetical statement above was very good. If he’s trying to fool people, he should feel proud of himself. He’s doing a very convincing job! Still, it doesn’t apologize for doing normal AI company stuff in the past, or plan to stop doing normal AI company stuff in the present. It just says that, at some indefinite point when they decide AI is a threat, they’re going to do everything right. This is more believable when OpenAI says it than when ExxonMobil does. There are real arguments for why you might want to switch from moving fast and breaking things at time t to acting all responsible at time t + 1 . Let’s explore the arguments they make in the document, go over the reasons they’re obviously wrong, then look at the more complicated arguments they might be based off of. Why Doomers Think OpenAI Is Bad And Should Have Slowed Research A Long Time AgoOpenAI boosters might object: there’s a disanalogy between the global warming story above and AI capabilities research. Global warming is continuously bad: a temperature increase of 0.5 degrees C is bad, 1.0 degrees is worse, and 1.5 degrees is worse still. AI doesn’t become dangerous until some specific point. GPT-3 didn’t hurt anyone. GPT-4 probably won’t hurt anyone. So why not keep building fun chatbots like these for now, then start worrying later? Doomers counterargue that the fun chatbots burn timeline. That is, suppose you have some timeline for when AI becomes dangerous. For example, last year Metaculus thought human-like AI would arrive in 2040, and superintelligence around 2043. Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI. Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), and gives alignment researchers eight years, not twenty. So the faster companies advance AI research - even by creating fun chatbots that aren’t dangerous themselves - the harder it is for alignment researchers to solve their part of the problem in time. This is why some AI doomers think of OpenAI as an Exxon-Mobil style villain, even though they’ve promised to change course before the danger period. Imagine an environmentalist group working on research and regulatory changes that would have solar power ready to go in 2045. Then ExxonMobil invents a new kind of super-oil that ensures that, nope, all major cities will be underwater by 2031 now. No matter how nice a statement they put out, you’d probably be pretty mad! Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad LaterOpenAI understands the argument about burning timelines. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence - whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc - we can do this better when everything is happening gradually and we’ve got concrete AIs to think about:
You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this:
Meanwhile, in real life, OpenAI released ChatGPT in late November, helped Microsoft launch the Bing chatbot in February, and plans to announce GPT-4 in a few months. Nobody thinks society has even partially adapted to any of these, or that alignment researchers have done more than begin to study them. The only sense in which OpenAI supports gradualism is the sense in which they’re not doing lots of research in secret, then releasing it all at once. But there are lots of better plans than either doing that, or going full-speed ahead. So what’s OpenAI thinking? I haven’t asked them and I don’t know for sure, but I’ve heard enough debates around this that I have some guesses about the kinds of arguments they’re working off of. I think the longer versions would go something like this: The Race Argument:
The Compute Argument:
The Fire Alarm Argument:
These three lines of reasoning argue that that burning a lot of timeline now might give us a little more timeline later. This is a good deal if:
I’m skeptical of all of these. DeepMind thought they were establishing a lead in 2008, but OpenAI has caught up to them. OpenAI thought they were establishing a lead the past two years, but a few months after they came out with GPT, at least Google, Facebook, and Anthropic had comparable large language models; a few months after they came out with DALL-E, random nobody startups came out with StableDiffusion and MidJourney. None of this research has established a commanding lead, it’s just moved everyone forward together and burned timelines for no reason. The alignment researchers I’ve talked to say they’ve already got their hands full with existing AIs. Probably they could do better work with more advanced models, but it’s not an overwhelming factor, and they would be happiest getting to really understand what’s going on now before the next generation comes out. One researcher I talked to said the arguments for acceleration made sense five years ago, when there was almost nothing worth experimenting on, but that they no longer think this is true. Finally, all these arguments for burning timelines require that lots of things go right later. The same AI companies burning timelines now turn into model citizens when the stakes get higher, and convert their lead into improved safety instead of capitalizing on it to release lucrative products. The government responds to an AI crisis responsibly, rather than by ignoring it or making it worse. If someone screws up the galaxy-brained plan, then we burn perfectly good timeline but get none of the benefits. Why Cynical People Might Think All Of This Is A Sham AnywayThese are interesting arguments. But we should also consider the possibility that OpenAI is a normal corporation, does things for normal corporate reasons (like making money), and releases nice-sounding statements for normal corporate reasons (like defusing criticism). Brian Chau has an even more cynical take: ![]() My take on the OpenAI obvious Yud/EA flame fanning is that Sam knows AGI timelines are exaggerated and would rather talk about "what if money invested in OpenAI destroys the world" than "what if money invested in OpenAI no stop providing ROI" ![]() OpenAI @OpenAI OpenAI wants to sound exciting and innovative. If they say “we are exciting and innovative”, this is obvious self-promotion and nobody will buy it. If they say “we’re actually a dangerous and bad company, our products might achieve superintelligence and take over the world”, this makes them sound self-deprecating, while also establishing that they’re exciting and innovative. Is this too cynical? I’m not sure. On the one hand, OpenAI has been expressing concern about safety since day one - the article announcing their founding in 2015 was titled Elon Musk Just Founded A New Company To Make Sure Artificial Intelligence Doesn’t Destroy The World. On the other hand - man, they sure have burned a lot of timeline. The one thing everyone was trying to avoid in the early 2010s was an AI race. DeepMind was the first big AI company, so we should just let them to their thing, go slowly, get everything right, and avoid hype. Then Elon Musk founded OpenAI in 2015, murdered that plan, mutilated the corpse, and danced on its grave. Even after Musk left, the remaining team did everything to challenge everyone else to a race short of shooting a gun and waving a checkered flag. ![]() Ten people are each given a separate button. If you press the button you get $1 million. If anyone presses a button, there is a 50% chance the world ends one year from now, so same risk if 1 or 10 press. What happens, and how does this game relate to AGI risk? OpenAI still hasn’t given a good explanation of why they did this. Absent anything else, I’m forced to wonder if it’s just “they’re just the kind of people who would do that sort of thing” - in which case basically any level of cynicism would be warranted. I hate this conclusion. I’m trying to resist it. I want to think the best of everyone. Individual people at OpenAI have been very nice to me. I like them. They've done many good things for the world. But the rationalists and effective altruists are still reeling from the FTX collapse. Nobody knew FTX was committing fraud, but everyone knew they were a crypto company with a reputation for sketchy cutthroat behavior. But SBF released many well-written statements about how he would do good things and not bad things. Many FTX people were likable and personally very nice to me. I think many of them genuinely believed everything they did was for the greater good. And looking back, I wish I’d had a heuristic something like:
As the saying goes, “if I had a nickel every time I found myself in this situation, I would have two nickels, but it’s still weird that it happened twice.” What We’re Going To Do NowRealistically we’re going to thank them profusely for their extremely good statement, then cross our fingers really hard that they’re telling the truth. OpenAI has unilaterally offered to destroy the world a bit less than they were doing before. They’ve voluntarily added things that look like commitments - some enforceable in the court of public opinion, others potentially in courts of law. Realistically we’ll say “thank you for doing that”, offer to help them turn those commitments into reality, and do our best to hold them to it. It doesn’t mean we have to like them period, or stop preparing for them to betray us. But on this particular sub-sub-topic we should take the W. For example, they write:
The linked charter clause says:
This is a great start. It raises questions like: Who decides whether someone has a better-than-even chance? Who decides what AGI means here? Who decides which other projects are value-aligned and safety-conscious? A good followup would be to release firmer trigger-action plans on what would activate their commitments and what form their response would take, to prevent goalpost-moving later. They could come up with these themselves, or in consultation with outside experts and policy researchers. This would be the equivalent of ExxonMobil saying they’ll switch to environmentalist mode at the exact moment that warming passes 1.5 degrees C - maybe still a little strange, but starting to sound genuinely promising. (!create #reminders "check if this ever went anywhere" date 2024/03/01) Their statement continues:
Reading between the lines, this sounds like a reference to the new ARC Evals Project, where some leading alignment researchers and strategists have gotten together to work on “benchmarks” for safety. Reading even further between the lines - at this point it’s total guesswork - OpenAI’s corporate partner Microsoft asked them for a cool AI. OpenAI assumed Microsoft was competent - they make Windows and stuff! - and gave them a rough draft of GPT-4. Microsoft was not competent, skipped fine-tuning and many other important steps which OpenAI would not have skipped, and released it as the Bing chatbot. Bing got in trouble for threatening users, which gave OpenAI a PR headache. Some very smart alignment people realized this was the right time to cash in their political capital with OpenAI and asked them to cooperate with ARC Evals. OpenAI decided (for a mix of altruistic and selfish reasons) to accept, hence this document. If that’s even slightly true, it’s a really encouraging sign. Where OpenAI goes, other labs might follow. The past eight years of OpenAI policy have been far from ideal. But this document represents a commitment to move from safety laggard to safety model, and I look forward to seeing how it works out. You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Highlights From The Comments On Geography Of Madness
Monday, February 27, 2023
Plus: A case for culture-bound mental disorder skepticism
Open Thread 265
Sunday, February 26, 2023
Announcing Forecasting Impact Mini-Grants
Friday, February 24, 2023
Book Review: The Geography Of Madness
Wednesday, February 22, 2023
Grading My 2018 Predictions For 2023
Monday, February 20, 2023
You Might Also Like
Surprise! People don't want AI deciding who gets a kidney transplant and who dies or endures years of misery [Mon Mar 10 2025]
Monday, March 10, 2025
Hi The Register Subscriber | Log in The Register Daily Headlines 10 March 2025 AI Surprise! People don't want AI deciding who gets a kidney transplant and who dies or endures years of misery
How to Keep Providing Gender-Affirming Care Despite Anti-Trans Attacks
Sunday, March 9, 2025
Using lessons learned defending abortion, some providers are digging in to serve their trans patients despite legal attacks. Most Read Columbia Bent Over Backward to Appease Right-Wing, Pro-Israel
Guest Newsletter: Five Books
Sunday, March 9, 2025
Five Books features in-depth author interviews recommending five books on a theme Guest Newsletter: Five Books By Sylvia Bishop • 9 Mar 2025 View in browser View in browser Five Books features in-depth
GeekWire's Most-Read Stories of the Week
Sunday, March 9, 2025
Catch up on the top tech stories from this past week. Here are the headlines that people have been reading on GeekWire. ADVERTISEMENT GeekWire SPONSOR MESSAGE: Revisit defining moments, explore new
10 Things That Delighted Us Last Week: From Seafoam-Green Tights to June Squibb’s Laundry Basket
Sunday, March 9, 2025
Plus: Half off CosRx's Snail Mucin Essence (today only!) The Strategist Logo Every product is independently selected by editors. If you buy something through our links, New York may earn an
🥣 Cereal Of The Damned 😈
Sunday, March 9, 2025
Wall Street corrupts an affordable housing program, hopeful parents lose embryos, dangers lurk in your pantry, and more from The Lever this week. 🥣 Cereal Of The Damned 😈 By The Lever • 9 Mar 2025 View
The Sunday — March 9
Sunday, March 9, 2025
This is the Tangle Sunday Edition, a brief roundup of our independent politics coverage plus some extra features for your Sunday morning reading. What the right is doodling. Steve Kelley | Creators
☕ Chance of clouds
Sunday, March 9, 2025
What is the future of weather forecasting? March 09, 2025 View Online | Sign Up | Shop Morning Brew Presented By Fatty15 Takashi Aoyama/Getty Images BROWSING Classifieds banner image The wackiest
Federal Leakers, Egg Investigations, and the Toughest Tongue Twister
Sunday, March 9, 2025
Homeland Security Secretary Kristi Noem said Friday that DHS has identified two “criminal leakers” within its ranks and will refer them to the Department of Justice for felony prosecutions. ͏ ͏ ͏
Strategic Bitcoin Reserve And Digital Asset Stockpile | White House Crypto Summit
Saturday, March 8, 2025
Trump's new executive order mandates a comprehensive accounting of federal digital asset holdings. Forbes START INVESTING • Newsletters • MyForbes Presented by Nina Bambysheva Staff Writer, Forbes