Astral Codex Ten - Mantic Monday 4/24/23
Can AIs Predict The Future? By Which We Mean The Past?If we asked GPT-4 to play a prediction market, how would it do? Actual GPT-4 probably would just give us some boring boilerplate about how the future is uncertain and it’s irresponsible to speculate. But what if AI researchers took some other model that had been trained not to do that, and asked it? This would take years to test, as we waited for the events it predicted to happen. So instead, what if we took a model trained off text from some particular year (let’s say 2020) and asked it to predict forecasting questions about the period 2020 - 2023. Then we could check its results immediately! This is the basic idea behind Zou et al (2022), Forecasting Future World Events With Neural Networks. They create a dataset, Autocast, with 6000 questions from forecasting tournaments Metaculus, Good Judgment Project, and CSET Foretell. Then they ask their AI (a variant of GPT-2) to predict them, given news articles up to some date before the event happened. Here’s their result: . . . okay, this isn’t very interesting. GPT-2, a very weak obsolete AI, is able to do better than chance, but much worse than humans. I don’t know what I was expecting. This paper isn’t interesting because the AI did well (it didn’t). It’s interesting as the first foray into quantifying AI forecasting ability. Sometime soon, someone will test how a GPT-3 or GPT-4 sized model does at this task. Probably it will do better. How much better? I’m pretty curious. Can a big enough language model equal humans at forecasting? What would we do with it if it could? The authors write:
The problem with forecasting tournaments is that there are only so many superforecasters in the world, and you can’t make them spend a lot of time considering every question you’re interested in. Real money prediction markets try to solve this by creating an incentive to participate in them, but they’re mostly illegal. Good AI forecasters would solve this problem and let forecasting scale. You can access their dataset here. The authors were originally planning to host a competition to see who could create the best AI forecaster, but due to financial constraints they’ll be running only a reduced version. You can read more about the semi-competition here. Metaculus Looking GoodTwo new reports say nice things about Metaculus’ accuracy. Vasco Grilo finds it’s much better than low information priors. A simple low-information prior is a coin flip - betting 50% on all yes-no questions. But you can do better: if only 16% of previous Metaculus predictions on politics have resolved true (maybe because question-makers like asking about outlandish possibilities), you can bet 16% chance for the next politics question. Vasco tries some things a little more sophisticated than that, but he finds Metaculus always beats the prior. We should expect that - expert opinion is better than random guessing - but it’s always good to be sure. This also lets us compare how accurate forecasts are in different categories. For example, we see here that AI forecasts have less of an advantage over low-information priors than the average, suggesting that this field is especially tough to predict. But there’s still an advantage. Next time someone tries to tell you that AI is IMPOSSIBLE TO PREDICT and ABSOLUTELY ANYTHING CAN HAPPEN, tell them that actually forecasters achieved a Brier score of 0.160 in AI predictions when guessing the low-information prior would only have given them 0.248. Nikos Bosse compares Metaculus’ performance to its “competitor” Manifold Markets, and finds that overall Metaculus was more accurate:
Does this mean that forecasting tournaments are better than prediction markets? Some past studies have provided very tentative evidence in that direction, but this one probably doesn’t - many more people use Metaculus than Manifold, and Nikos didn’t control for number of forecasters. Nikos also gives us this beautiful graph showing how forecasts on the two platforms track each other (click to expand): He concludes:
This Month In The MarketsThis was the forecast I found myself most interested in this month, and it seems like Manifold has a strong opinion. The drop a few days ago was when Sam Altman said OpenAI wasn’t currently training GPT-5 and “won’t for some time”. Apparently forecasters don’t expect them to take too long a break. We’ve talked before about LLMs playing chess; they can sort of do it, but they’re not very good yet. The market thinks 34% chance they’ll get much better in the next five years; I think my estimate is lower. Polymarket is dipping its toes into AI forecasting. This particular one is off to a tough start: GPT-4 came out a month or so after this market was launched, but OpenAI hasn’t said how many parameters it has. You can see all open AI questions (currently just three) here. Also on Polymarket: Manifold is about the same on the same question. Metaculus’s fancy date prediction system lets them be more specific: . . . and also seems pretty sure it will be late this year. Remember when Elon Musk said he would step down as CEO of Twitter? You can see that at the December 2022 mark here - looks like some people made a lot of money buying the dip. I think of this question as tracking the rise of interest in prediction markets among sci/tech celebrities. Podcaster Lex Fridman (2.7 million Twitter followers) joined Manifold and bet M$100 on himself, causing his shares to soar (they are now worth M$188). He still has not created a market. This is my Long Bet with Samo Burja - the resolution criteria are slightly different, but close enough to make me feel a little more confident I’m on the right side. Shorts1: Metaculus announces Conditional Pairs, where you can create questions that explore the relationship between two events, eg “if the US does/doesn’t default on its debt, will a Democrat win the 2024 election?” 2: Nuno Sempere: Tracking The Money Flows In Forecasting. EG Metaculus runs off of ~$6M in grants; Kalshi has $30M in VC funding. Gnosis, a crypto protocol that never went anywhere, apparently had a $230M market cap at one point, but this is probably some kind of fake crypto valuation trick. 4: Which is better - just looking at the few best forecasters, or fully using the wisdom of crowds? Nikos says it’s the crowds. You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Key phrases
Older messages
Open Thread 273
Monday, April 24, 2023
...
Links For April 2023
Thursday, April 20, 2023
...
Sign in to Astral Codex Ten
Wednesday, April 19, 2023
Here's a link to sign in to Astral Codex Ten. This link can only be used once and expires after 24 hours. If expired, please try logging in again here. Sign in now © 2023 Scott Alexander 548 Market
Open Thread 272
Wednesday, April 19, 2023
...
Highlights From The Comments On IRBs
Wednesday, April 19, 2023
...
You Might Also Like
🍿 The Hardy Boys on Acid
Sunday, April 28, 2024
Plus: 'The Lord of the Rings' Extended Cut is returning to theaters.
10 Things That Delighted Us: From Cardboard Bed Frames to Compact Makeup Stacks
Sunday, April 28, 2024
The most useful, thoughtful, and just plain fun things we uncovered this week. The Strategist Every product is independently selected by editors. If you buy something through our links, New York may
LEVER WEEKLY: Pentagon Grifts And Zombie Pipelines
Sunday, April 28, 2024
From insurance meltdowns and zombie pipelines to Pentagon grifts, here's all the news from The Lever this week. LEVER WEEKLY: Pentagon Grifts And Zombie Pipelines By The Lever • 28 Apr 2024 View in
Birds
Sunday, April 28, 2024
So hot right now ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
An urgent deadline is looming, and we’re short
Sunday, April 28, 2024
No donation is too large or too small. If there's one thing we've learned in the decade since The Intercept was founded, it's this: When you're taking on the biggest and most powerful
☕ 24/7
Sunday, April 28, 2024
Should the stock market be open 24 hours? Presented by ZitSticka April 28, 2024 | View Online | Sign Up | Shop The Spiral Bookstore in Guangzhou, China. John Ricky/Anadolu via Getty Images BROWSING
Open Thread 327
Sunday, April 28, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
‘The Everything War’: Inside Amazon with author and WSJ reporter Dana Mattioli
Saturday, April 27, 2024
High school students lead AI edtech startup | New climate-focused fund raises $5M ADVERTISEMENT GeekWire SPONSOR MESSAGE: Science Firsthand: Learn how Bristol Myers Squibb unlocked the potential of CAR
SEC Sued To Define Ether | Samouri Wallet Shut Down On Money Laundering Charges
Saturday, April 27, 2024
The asset manager's new short-term credit fund is hosted on the Ethereum blockchain. ADVERTISEMENT Forbes START INVESTING • Newsletters • MyForbes Mitchell Martin Senior Editor, Forbes Money &
“Is the media prepared for an extinction-level event?”
Saturday, April 27, 2024
The Intercept is determined to avoid the fate of outlets like Gawker, Vice News, and BuzzFeed News, all of which have closed down entirely. Earlier this year, the New Yorker described the 2681 layoffs