Astral Codex Ten - Mantic Monday 4/24/23
Can AIs Predict The Future? By Which We Mean The Past?If we asked GPT-4 to play a prediction market, how would it do? Actual GPT-4 probably would just give us some boring boilerplate about how the future is uncertain and it’s irresponsible to speculate. But what if AI researchers took some other model that had been trained not to do that, and asked it? This would take years to test, as we waited for the events it predicted to happen. So instead, what if we took a model trained off text from some particular year (let’s say 2020) and asked it to predict forecasting questions about the period 2020 - 2023. Then we could check its results immediately! This is the basic idea behind Zou et al (2022), Forecasting Future World Events With Neural Networks. They create a dataset, Autocast, with 6000 questions from forecasting tournaments Metaculus, Good Judgment Project, and CSET Foretell. Then they ask their AI (a variant of GPT-2) to predict them, given news articles up to some date before the event happened. Here’s their result: . . . okay, this isn’t very interesting. GPT-2, a very weak obsolete AI, is able to do better than chance, but much worse than humans. I don’t know what I was expecting. This paper isn’t interesting because the AI did well (it didn’t). It’s interesting as the first foray into quantifying AI forecasting ability. Sometime soon, someone will test how a GPT-3 or GPT-4 sized model does at this task. Probably it will do better. How much better? I’m pretty curious. Can a big enough language model equal humans at forecasting? What would we do with it if it could? The authors write:
The problem with forecasting tournaments is that there are only so many superforecasters in the world, and you can’t make them spend a lot of time considering every question you’re interested in. Real money prediction markets try to solve this by creating an incentive to participate in them, but they’re mostly illegal. Good AI forecasters would solve this problem and let forecasting scale. You can access their dataset here. The authors were originally planning to host a competition to see who could create the best AI forecaster, but due to financial constraints they’ll be running only a reduced version. You can read more about the semi-competition here. Metaculus Looking GoodTwo new reports say nice things about Metaculus’ accuracy. Vasco Grilo finds it’s much better than low information priors. A simple low-information prior is a coin flip - betting 50% on all yes-no questions. But you can do better: if only 16% of previous Metaculus predictions on politics have resolved true (maybe because question-makers like asking about outlandish possibilities), you can bet 16% chance for the next politics question. Vasco tries some things a little more sophisticated than that, but he finds Metaculus always beats the prior. We should expect that - expert opinion is better than random guessing - but it’s always good to be sure. This also lets us compare how accurate forecasts are in different categories. For example, we see here that AI forecasts have less of an advantage over low-information priors than the average, suggesting that this field is especially tough to predict. But there’s still an advantage. Next time someone tries to tell you that AI is IMPOSSIBLE TO PREDICT and ABSOLUTELY ANYTHING CAN HAPPEN, tell them that actually forecasters achieved a Brier score of 0.160 in AI predictions when guessing the low-information prior would only have given them 0.248. Nikos Bosse compares Metaculus’ performance to its “competitor” Manifold Markets, and finds that overall Metaculus was more accurate:
Does this mean that forecasting tournaments are better than prediction markets? Some past studies have provided very tentative evidence in that direction, but this one probably doesn’t - many more people use Metaculus than Manifold, and Nikos didn’t control for number of forecasters. Nikos also gives us this beautiful graph showing how forecasts on the two platforms track each other (click to expand): He concludes:
This Month In The MarketsThis was the forecast I found myself most interested in this month, and it seems like Manifold has a strong opinion. The drop a few days ago was when Sam Altman said OpenAI wasn’t currently training GPT-5 and “won’t for some time”. Apparently forecasters don’t expect them to take too long a break. We’ve talked before about LLMs playing chess; they can sort of do it, but they’re not very good yet. The market thinks 34% chance they’ll get much better in the next five years; I think my estimate is lower. Polymarket is dipping its toes into AI forecasting. This particular one is off to a tough start: GPT-4 came out a month or so after this market was launched, but OpenAI hasn’t said how many parameters it has. You can see all open AI questions (currently just three) here. Also on Polymarket: Manifold is about the same on the same question. Metaculus’s fancy date prediction system lets them be more specific: . . . and also seems pretty sure it will be late this year. Remember when Elon Musk said he would step down as CEO of Twitter? You can see that at the December 2022 mark here - looks like some people made a lot of money buying the dip. I think of this question as tracking the rise of interest in prediction markets among sci/tech celebrities. Podcaster Lex Fridman (2.7 million Twitter followers) joined Manifold and bet M$100 on himself, causing his shares to soar (they are now worth M$188). He still has not created a market. This is my Long Bet with Samo Burja - the resolution criteria are slightly different, but close enough to make me feel a little more confident I’m on the right side. Shorts1: Metaculus announces Conditional Pairs, where you can create questions that explore the relationship between two events, eg “if the US does/doesn’t default on its debt, will a Democrat win the 2024 election?” 2: Nuno Sempere: Tracking The Money Flows In Forecasting. EG Metaculus runs off of ~$6M in grants; Kalshi has $30M in VC funding. Gnosis, a crypto protocol that never went anywhere, apparently had a $230M market cap at one point, but this is probably some kind of fake crypto valuation trick. 4: Which is better - just looking at the few best forecasters, or fully using the wisdom of crowds? Nikos says it’s the crowds. You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Open Thread 273
Monday, April 24, 2023
...
Links For April 2023
Thursday, April 20, 2023
...
Sign in to Astral Codex Ten
Wednesday, April 19, 2023
Here's a link to sign in to Astral Codex Ten. This link can only be used once and expires after 24 hours. If expired, please try logging in again here. Sign in now © 2023 Scott Alexander 548 Market
Open Thread 272
Wednesday, April 19, 2023
...
Highlights From The Comments On IRBs
Wednesday, April 19, 2023
...
You Might Also Like
How to Keep Providing Gender-Affirming Care Despite Anti-Trans Attacks
Sunday, March 9, 2025
Using lessons learned defending abortion, some providers are digging in to serve their trans patients despite legal attacks. Most Read Columbia Bent Over Backward to Appease Right-Wing, Pro-Israel
Guest Newsletter: Five Books
Sunday, March 9, 2025
Five Books features in-depth author interviews recommending five books on a theme Guest Newsletter: Five Books By Sylvia Bishop • 9 Mar 2025 View in browser View in browser Five Books features in-depth
GeekWire's Most-Read Stories of the Week
Sunday, March 9, 2025
Catch up on the top tech stories from this past week. Here are the headlines that people have been reading on GeekWire. ADVERTISEMENT GeekWire SPONSOR MESSAGE: Revisit defining moments, explore new
10 Things That Delighted Us Last Week: From Seafoam-Green Tights to June Squibb’s Laundry Basket
Sunday, March 9, 2025
Plus: Half off CosRx's Snail Mucin Essence (today only!) The Strategist Logo Every product is independently selected by editors. If you buy something through our links, New York may earn an
🥣 Cereal Of The Damned 😈
Sunday, March 9, 2025
Wall Street corrupts an affordable housing program, hopeful parents lose embryos, dangers lurk in your pantry, and more from The Lever this week. 🥣 Cereal Of The Damned 😈 By The Lever • 9 Mar 2025 View
The Sunday — March 9
Sunday, March 9, 2025
This is the Tangle Sunday Edition, a brief roundup of our independent politics coverage plus some extra features for your Sunday morning reading. What the right is doodling. Steve Kelley | Creators
☕ Chance of clouds
Sunday, March 9, 2025
What is the future of weather forecasting? March 09, 2025 View Online | Sign Up | Shop Morning Brew Presented By Fatty15 Takashi Aoyama/Getty Images BROWSING Classifieds banner image The wackiest
Federal Leakers, Egg Investigations, and the Toughest Tongue Twister
Sunday, March 9, 2025
Homeland Security Secretary Kristi Noem said Friday that DHS has identified two “criminal leakers” within its ranks and will refer them to the Department of Justice for felony prosecutions. ͏ ͏ ͏
Strategic Bitcoin Reserve And Digital Asset Stockpile | White House Crypto Summit
Saturday, March 8, 2025
Trump's new executive order mandates a comprehensive accounting of federal digital asset holdings. Forbes START INVESTING • Newsletters • MyForbes Presented by Nina Bambysheva Staff Writer, Forbes
Researchers rally for science in Seattle | Rad Power Bikes CEO departs
Saturday, March 8, 2025
What Alexa+ means for Amazon and its users ADVERTISEMENT GeekWire SPONSOR MESSAGE: Revisit defining moments, explore new challenges, and get a glimpse into what lies ahead for one of the world's