Astral Codex Ten - The Extinction Tournament
This month’s big news in forecasting: the Forecasting Research Institute has released the results of the Existential Risk Persuasion Tournament (XPT). XPT was supposed to use cutting-edge forecasting techniques to develop consensus estimates of the danger from various global risks like climate change, nuclear war, etc. The plan was: get domain experts (eg climatologists, nuclear policy experts) and superforecasters (people with a proven track record of making very good predictions) in the same room. Have them talk to each other. Use team-based competition with monetary prizes to incentivize accurate answers. Between the domain experts’ knowledge and the superforecasters’ prediction-making ability, they should be able to converge on good predictions. They didn’t. In most risk categories, the domain experts predicted higher chances of doom than the superforecasters. No amount of discussion could change minds on either side. The tournament asked about two categories of global disaster. “Catastrophe” meant an event that killed >10% of the the population within five years. It’s unclear whether anything in recorded history would qualify; Genghis Khan’s hordes and the Black Plague each killed about 20% of the global population, but both events were spread out over a few decades. “Extinction” meant reducing total human population below 5,000 (it didn’t require literal extinction). This is very hard! Nuclear war is very unlikely to do this; people in bunkers or remote islands would survive at least the original blasts, and probably any ensuing winter. Even the worst pandemic might not get every remote island or uncontacted Amazonian tribe. Participants assigned the highest literal-extinction risk to AI, maybe because it can deliberately hunt down survivors. You might notice that all of these numbers are pretty low! I’ve previously said I thought there was a 33% chance of AI extinction alone (and lots of people are higher than me). Existential risk expert Toby Ord estimated a 16% total chance of extinction by 2020, which is 16x higher than these superforecasters and 2.5x higher than these domain experts. In some sense, this is great news. These kinds of expert + superforecasting tournaments seem trustworthy. Should we update our risk of human extinction downward? Cancelling The Apocalypse?It’s weird that there’s so much difference between experts and superforecasters, and awkward for me that both groups are so far away from my own estimates and those of people I trust (like Toby). Is there any reason to doubt the results? Were the incentives bad?The subreddit speculates about this - after all, you can’t get paid, or congratulated, or given a trophy, if the world goes extinct. Does that bias superforecasters - who are used to participating in prediction markets and tournaments - downward? What about domain experts, who might be subconsciously optimizing for prestige and reputation? This tournament tried to control for that in a few ways. First, most of the monetary incentives were for things other than predicting extinction. There were incentives for making good arguments that persuaded other participants, for correctly predicting intermediate steps to extinction (for example, a small pandemic, or a limited nuclear exchange), or for correctly guessing what other people would guess (this technique, called “reciprocal scoring”, has been validated in past experiments). Second, this wasn’t really an incentive-based prediction market. Although they kept a few incentives as described above, it was mostly about asking people who had previously demonstrated good predictive accuracy to give their honest impressions. At some point you just have to trust that, absent incentives either way, reasonable people with good track records can be smart and honest. Third, a lot of the probabilities here were pretty low. For example, the superforecasters got an 0.4% probability of AI-based extinction, compared to the domain experts’ 3%. At these levels it’s probably not worth optimizing your answers super-carefully to get a tiny amount of extra money or credibility. If it’s the year 2100, and we didn’t die from AI, who was right - the people who said there was a 3% chance, or the people who said there was an 0.4% chance? Everyone in this tournament was smart enough to realize that survival in one timeline wouldn’t provide much evidence either way. As tempting as it is to dismiss this surprising result with an appeal to the incentive structure, we’re not going to escape that easily. Were the forecasters stupid?Aside from the implausibility of dozens of top superforecasters and domain experts being dumb, both groups got easy questions right. The bio-risks questions are a good benchmark here: There are centuries’ worth of data on non-genetically-engineered plagues to give us base rates; these give us a base rate of ~25% per century = 20% between now and 2100. But we have better epidemiology and medicine than most of the centuries in our dataset. The experts said 8% chance and the superforecasters said 4% chance, and both of those seem like reasonable interpretations of the historical data to me. The “WHO declares emergency” question is even easier - just look at how often it’s done that in the past and extrapolate forward. Both superforecasters and experts mostly did that. Likewise, lots of scientists have put a lot of work into modeling the climate, there aren’t many surprises there, and everyone basically agreed on the extent of global warming: Wherever there was clear past data, both superforecasters and experts were able to use it correctly and get similar results. It was only when they started talking about things that had never happened before - global nuclear war, bioengineered pandemics, and AI - that they started disagreeing. Were the participants out of their depth?Peter McCluskey, one of the more-AI-concerned superforecasters in the tournament, wrote about his experience on Less Wrong. Quoting liberally:
The “won’t understand causality” and “what if it’s all hype” objections really don’t impress me. Many of the people in this tournament hadn’t really encountered arguments about AI extinction before (potentially including the “AI experts” if they were just eg people who make robot arms or something), and a couple of months of back and forth discussion in the middle of a dozen other questions probably isn’t enough for even a smart person to wrap their brain around the topic. Was this tournament done so long ago that it has been outpaced by recent events?The tournament was conducted in summer 2022. This was before ChatGPT, let alone GPT-4. The conversation around AI noticeably changed pitch after these two releases. Maybe that affected the results? In fact, the participants have already been caught flat-footed on one question: A recent leak suggested that the cost of training GPT-4 was $63 million, which is already higher than the superforecasters’ median estimate of $35 million by 2024 has already been proven incorrect. I don’t know how many petaFLOP-days were involved in GPT-4, but maybe that one is already off also. There was another question on when an AI would pass a Turing Test. The superforecasters guessed 2060, the domain experts 2045. GPT-4 hasn’t quite passed the exact Turing Test described in the study, but it seems very close, so much so that we seem on track to pass it by the 2030s. Once again the experts look better than the superforecasters. So is it possible that we, in 2023, now have so much better insight into AI than the 2022 forecasters that we can throw out their results? We could investigate this by looking at Metaculus, a forecasting site that’s probably comparably advanced to this tournament. They have a question suspiciously similar to XPT’s global catastrophe framing: In summer 2022, the Metaculus estimate was 30%, compared to the XPT superforecasters’ 9% (why the difference? maybe because Metaculus is especially popular with x-risk-pilled rationalists). Since then it’s gone up to 38%. Over the same period, Metaculus estimates of AI catastrophe risk went from 6% to 15%. If the XPT superforecasters’ probabilities rose linearly by the same factor as Metaculus forecasters’, they might be willing to update total global catastrophe risk to 11% and AI catastrophe risk to 5%. But the main thing we’ve updated on since 2022 is that AI might be sooner. But most people in the tournament already agreed we would get AGI by 2100. The main disagreement was over whether it would cause a catastrophe once we got it. You could argue that getting it sooner increases that risk, since we’ll have less time to work on alignment. But I would be surprised if the kind of people saying the risk of AI extinction is 0.4% are thinking about arguments like that. So maybe we shouldn’t expect much change. FRI called back a few XPT forecasters in May 2023 to see if any of them wanted to change their minds, but they mostly didn’t. OverallI don’t think this was just a problem of the incentives being bad or the forecasters being stupid. This is a real, strong disagreement. We may be able to slightly increase their forecast based on recent events, but this would only change the estimate a little. Breaking Down The AI EstimateHow did the forecasters arrive at their AI estimate? What were the cruxes between the people who thought AI was very dangerous, and the people who thought it wasn’t? You can think of AI extinction as happening in a series of steps:
This isn’t a perfect breakdown. Steps 2 and 3 are complicated: some early AIs will be misaligned but it won’t be a problem because they’re too weak to hurt us (ChaosGPT is already misaligned)! But if we define (2) as “the first AI capable of killing all humans”, then (3) is 100% by definition. Still, there ought to be some decomposition like this. Where do I and the superforecasters part ways? Question 51 asks when we will have AGI (the resolution criteria are that whatever Nick Bostrom says goes). Everyone agrees it’s pretty likely we’ll have AGI (as per Bostrom) by 2100, although the domain experts are a little more convinced than the superforecasters. There was no question about when or whether we’ll have superintelligence. Metaculus thinks superintelligence will come very shortly after human-level intelligence, and this is the conclusion of the best models and analyses I’ve seen as well. Still, I don’t know if the superforecasters here also believed this. At this point I’ve kind of exhausted the information I have from XPT, so I’m going to switch to Metaculus and hope it’s a good enough window into forecasters’ thought processes to transfer over. Metaculus wasn’t really built for this and I have to fudge a lot of things, but based on this, this, and this question, plus this synthesis, here’s my interpretation of what they’re thinking: Most of them expect superintelligence to happen and not cause a giant catastrophe, although there is a much higher chance it just goes vaguely badly somehow and produces a society which is bad to live in. This last part is an assumption from many other conditional probabilities plus this question and probably shouldn’t be taken too seriously. Eyeballing the XPT answers, I think they split more like 80-20 on the AGI by 2100 question, I would expect them to split more like 50-50 on the superintelligence-conditional-on-AGI question, and that’s enough to explain the decreased risk of AGI catastrophe without necessarily bringing in more optimistic beliefs about alignment. It’s not enough to explain the decreased risk of extinction, so the XPT forecasters must believe there’s a higher chance AGI kills many people but not everyone. This could either be because humans fight a war with AGI and win, because AGI causes an “unintentional” catastrophe without being agentic enough to finish the job (eg helps terrorists build a bioweapon), or because AGI defeats humans but lets them continue existing in some form. Final Thoughts: Athanasius Contra MundumAre you allowed to look at a poll of all the world’s top experts plus the superforecasters who have been right most often before, correctly incentivized and aggregated using cutting-edge techniques, and say “yeah, okay, but I disagree”? There’s been a lot of discussion about this comic and the ideas behind it recently. Can ordinary people disagree with “the experts”? If so, when and how? My usual answers are that this is sometimes permissible: sometimes because sometimes official expert bodies are operating under bad incentive constraints, other times because the people involved don’t understand statistics/rationality/predictions very well. This study could have been deliberately designed to make me sweat. It was a combination of well-incentivized experts with no ulterior motives plus masters of statistics/rationality/predictions. All of my usual arguments have been stripped away. I think there's a 33% chance of AI extinction, but this tournament estimated 0.3 - 5%. Should I be forced to update? This is a hard question, and got me thinking about what "forced to update" even means. The Inside View Theory Of Updating is that you consult the mysterious lobe of your brain that handles these kinds of things and ask it what it thinks. If it returns a vague feeling of about 33% certainty, then your probability is 33%. You can feed that brain lobe statements like "by the way, you know that all of the top experts and superforecasters and so on think this will definitely not happen, right?" and then it will do mysterious brain things, and at the end of those mysterious brain things it will still feel about 33% certain, and you should still describe yourself as thinking there's a 33% chance. The Outside View Theory is more like - you think about all the people who disagree with the experts. There are those people who think the moon landing was fake, and the experts tell them they're wrong, and they refuse to update, and you think that's a really bad decision. There are those people who think COVID vaccines don't work, and ditto. When you think of those people, you wish they would have the sense to just ignore whatever their mysterious reasoning lobes are telling them and trust the experts instead. But then it seems kind of hypocritical if you don't also defer to the experts on when it's your turn to disagree with them. By "hypocritical" I mean both a sort of epistemic failure, where you're asserting a correct reasoning procedure but then refusing to follow it - and also a sort of moral failure, where your wish that they would change their minds won't be honored by the Gods of Acausal Trade unless you also change your mind. You can compromise between these views. One compromise is that you should meditate very hard on the Outside View and see if it makes your mysterious brain lobe update its probability. If it doesn't, uh, meditate harder, I guess. Another compromise is to agree to generally act based on the Outside View in order to be a good citizen, while keeping your Inside View estimate intact so that everyone else doesn't double-update on your opinions or cause weird feedback loops and cascades. The strongest consideration pushing me towards Inside View on this topic is Peter McCluskey's account linked earlier. When I think of vague "experts" applying vague "expertise" to the problem, I feel tempted to update. But when I hear their actual arguments, and they're the same dumb arguments as all the other people I roll my eyes at, it's harder to take them seriously. Still, the considerations for Outside View don't completely lack compelling power, so I suppose I update to more like 20 - 25% chance. This is still pretty far from the top tournament-justifiable probability of 5%, or even a tournament-justifiable-updated-by-recent-events probability of 5-10%. But it's the lowest I can make the mysterious-number-generating lobe of my brain go before it threatens to go on strike in protest. I’m heartened to remember that the superforecasters and domain experts in this study did the same. Confronted with the fact that domain experts/superforecasters had different estimates than they did, superforecasters/domain experts refused to update, and ended an order of magnitude away from each other. That seems like an endorsement of non-updating from superforecasters and domain experts! And who am I to disagree with such luminaries? It would be like trying to take over a difficult plane-landing from a pilot! Far better to continue stubbornly disagreeing with domain experts and superforecasters, just like my role models the superforecasters and domain experts do. You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Berkeley Meetup On Sunday, Special Guest Philip Tetlock
Wednesday, July 19, 2023
...
Contra The xAI Alignment Plan
Monday, July 17, 2023
Machine Alignment Monday 7/17/23
Open Thread 285
Monday, July 17, 2023
...
Contra The Social Model Of Disability
Sunday, July 16, 2023
...
Your Book Review: The Educated Mind
Sunday, July 16, 2023
Finalist #9 in the Book Review Contest
You Might Also Like
GeekWire's Most-Read Stories of the Week
Sunday, November 24, 2024
Catch up on the top tech stories from this past week. Here are the headlines that people have been reading on GeekWire. ADVERTISEMENT GeekWire SPONSOR MESSAGE: Get your ticket for AWS re:Invent,
13 Things That Delighted Us Last Week: From Daschund Bags to Sparkly Toilet Seats
Sunday, November 24, 2024
Plus, the Gucci poker set that Jennifer Tilly packs in her carry-on. The Strategist Logo Every product is independently selected by editors. If you buy something through our links, New York may earn an
LEVER WEEKLY: Trump's Cabinet Of Curiosities
Sunday, November 24, 2024
Opening up Trump's corruption-riddled cabinet and more from The Lever this week. LEVER WEEKLY: Trump's Cabinet Of Curiosities By The Lever • 24 Nov 2024 View in browser View in browser This is
What our travel expert brings on every trip
Sunday, November 24, 2024
M&Ms? View in browser Ad The Recommendation Ad Traveling is stressful for everyone, even travel writers Various travel gear items laid out on a yellow background. Michael Hession/NYT Wirecutter
☕ The Brew’s Holiday Gift Guide
Sunday, November 24, 2024
What to get everyone in your family... Presented By Bose November 24, 2024 | View Online | Sign Up | Shop Sunny Eckerle NOTE FROM THE WRITERS Good morning! Cassandra and Matty here, Morning Brew's
How Friendsgiving became America's favorite made-up holiday
Sunday, November 24, 2024
Plus: The real story behind FX's "Say Nothing," the horrifying effects of air pollution in South Asia, and more. November 25, 2024 View in browser Friendsgiving is just what America
'The most serious telecom hack in our history'
Saturday, November 23, 2024
Elon Musk's problem with Microsoft | Can you lie to an AI chatbot? ADVERTISEMENT GeekWire SPONSOR MESSAGE: Get your ticket for AWS re:Invent, happening Dec. 2–6 in Las Vegas: Register now for AWS
Bitcoin Nears $100,000 | Ledger’s Big Break
Saturday, November 23, 2024
A historic rally fueled by Trump's crypto agenda pushes bitcoin to new heights. Forbes START INVESTING • Newsletters • MyForbes Nina Bambysheva Staff Writer, Forbes Money & Markets Follow me on
The New MASTER PLAN
Saturday, November 23, 2024
Our second season will expose another hidden plot that has brought our world to the brink of collapse.
Guest Newsletter: Five Books
Saturday, November 23, 2024
Five Books features in-depth author interviews recommending five books on a theme Guest Newsletter: Five Books By Sylvia Bishop • 23 Nov 2024 View in browser View in browser Five Books features in-