Astral Codex Ten - Mantic Monday 4/24/23
Can AIs Predict The Future? By Which We Mean The Past?If we asked GPT-4 to play a prediction market, how would it do? Actual GPT-4 probably would just give us some boring boilerplate about how the future is uncertain and it’s irresponsible to speculate. But what if AI researchers took some other model that had been trained not to do that, and asked it? This would take years to test, as we waited for the events it predicted to happen. So instead, what if we took a model trained off text from some particular year (let’s say 2020) and asked it to predict forecasting questions about the period 2020 - 2023. Then we could check its results immediately! This is the basic idea behind Zou et al (2022), Forecasting Future World Events With Neural Networks. They create a dataset, Autocast, with 6000 questions from forecasting tournaments Metaculus, Good Judgment Project, and CSET Foretell. Then they ask their AI (a variant of GPT-2) to predict them, given news articles up to some date before the event happened. Here’s their result: . . . okay, this isn’t very interesting. GPT-2, a very weak obsolete AI, is able to do better than chance, but much worse than humans. I don’t know what I was expecting. This paper isn’t interesting because the AI did well (it didn’t). It’s interesting as the first foray into quantifying AI forecasting ability. Sometime soon, someone will test how a GPT-3 or GPT-4 sized model does at this task. Probably it will do better. How much better? I’m pretty curious. Can a big enough language model equal humans at forecasting? What would we do with it if it could? The authors write:
The problem with forecasting tournaments is that there are only so many superforecasters in the world, and you can’t make them spend a lot of time considering every question you’re interested in. Real money prediction markets try to solve this by creating an incentive to participate in them, but they’re mostly illegal. Good AI forecasters would solve this problem and let forecasting scale. You can access their dataset here. The authors were originally planning to host a competition to see who could create the best AI forecaster, but due to financial constraints they’ll be running only a reduced version. You can read more about the semi-competition here. Metaculus Looking GoodTwo new reports say nice things about Metaculus’ accuracy. Vasco Grilo finds it’s much better than low information priors. A simple low-information prior is a coin flip - betting 50% on all yes-no questions. But you can do better: if only 16% of previous Metaculus predictions on politics have resolved true (maybe because question-makers like asking about outlandish possibilities), you can bet 16% chance for the next politics question. Vasco tries some things a little more sophisticated than that, but he finds Metaculus always beats the prior. We should expect that - expert opinion is better than random guessing - but it’s always good to be sure. This also lets us compare how accurate forecasts are in different categories. For example, we see here that AI forecasts have less of an advantage over low-information priors than the average, suggesting that this field is especially tough to predict. But there’s still an advantage. Next time someone tries to tell you that AI is IMPOSSIBLE TO PREDICT and ABSOLUTELY ANYTHING CAN HAPPEN, tell them that actually forecasters achieved a Brier score of 0.160 in AI predictions when guessing the low-information prior would only have given them 0.248. Nikos Bosse compares Metaculus’ performance to its “competitor” Manifold Markets, and finds that overall Metaculus was more accurate:
Does this mean that forecasting tournaments are better than prediction markets? Some past studies have provided very tentative evidence in that direction, but this one probably doesn’t - many more people use Metaculus than Manifold, and Nikos didn’t control for number of forecasters. Nikos also gives us this beautiful graph showing how forecasts on the two platforms track each other (click to expand): He concludes:
This Month In The MarketsThis was the forecast I found myself most interested in this month, and it seems like Manifold has a strong opinion. The drop a few days ago was when Sam Altman said OpenAI wasn’t currently training GPT-5 and “won’t for some time”. Apparently forecasters don’t expect them to take too long a break. We’ve talked before about LLMs playing chess; they can sort of do it, but they’re not very good yet. The market thinks 34% chance they’ll get much better in the next five years; I think my estimate is lower. Polymarket is dipping its toes into AI forecasting. This particular one is off to a tough start: GPT-4 came out a month or so after this market was launched, but OpenAI hasn’t said how many parameters it has. You can see all open AI questions (currently just three) here. Also on Polymarket: Manifold is about the same on the same question. Metaculus’s fancy date prediction system lets them be more specific: . . . and also seems pretty sure it will be late this year. Remember when Elon Musk said he would step down as CEO of Twitter? You can see that at the December 2022 mark here - looks like some people made a lot of money buying the dip. I think of this question as tracking the rise of interest in prediction markets among sci/tech celebrities. Podcaster Lex Fridman (2.7 million Twitter followers) joined Manifold and bet M$100 on himself, causing his shares to soar (they are now worth M$188). He still has not created a market. This is my Long Bet with Samo Burja - the resolution criteria are slightly different, but close enough to make me feel a little more confident I’m on the right side. Shorts1: Metaculus announces Conditional Pairs, where you can create questions that explore the relationship between two events, eg “if the US does/doesn’t default on its debt, will a Democrat win the 2024 election?” 2: Nuno Sempere: Tracking The Money Flows In Forecasting. EG Metaculus runs off of ~$6M in grants; Kalshi has $30M in VC funding. Gnosis, a crypto protocol that never went anywhere, apparently had a $230M market cap at one point, but this is probably some kind of fake crypto valuation trick. 4: Which is better - just looking at the few best forecasters, or fully using the wisdom of crowds? Nikos says it’s the crowds. You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Open Thread 273
Monday, April 24, 2023
...
Links For April 2023
Thursday, April 20, 2023
...
Sign in to Astral Codex Ten
Wednesday, April 19, 2023
Here's a link to sign in to Astral Codex Ten. This link can only be used once and expires after 24 hours. If expired, please try logging in again here. Sign in now © 2023 Scott Alexander 548 Market
Open Thread 272
Wednesday, April 19, 2023
...
Highlights From The Comments On IRBs
Wednesday, April 19, 2023
...
You Might Also Like
☕ The ways of Waymo
Monday, January 13, 2025
Waymo's co-CEO at CES 2025. January 13, 2025 View Online | Sign Up Tech Brew presented by Hyilo It's Monday. Today we've got the final real-time dispatch from CES 2025, a keynote from Waymo
The Architects Of L.A.’s Wildfire Devastation
Monday, January 13, 2025
Developers and real estate interests crushed efforts to limit development in high-wildfire-risk areas — including in LA neighborhoods now in ashes. Again and again, developers and real estate interests
Is a popular new solution to methane gas just a lot of hot air?
Monday, January 13, 2025
Plus: Donald Trump's feud with a fish, a new hottest year on record, and more. January 13, 2025 View in browser Kenny Torrella is a senior reporter for Vox's Future Perfect section, with a
The corporate lobbyist who will run the Trump White House
Monday, January 13, 2025
During the 2024 campaign, Trump condemned the power of lobbyists in Washington, DC, and pledged that, if he returned to the White House, they would have no influence. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Trump Prosecutor Resigns, Bitcoins Left in Dump, and Smelly Video Games
Monday, January 13, 2025
Special counsel Jack Smith resigned from the Justice Department on Friday after submitting a final report on President-elect Donald Trump to Attorney Gen. Merrick Garland. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Numlock News: January 13, 2025 • Violins, Romantasy, Thieves
Monday, January 13, 2025
By Walt Hickey ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
☕ Crime-fighting vacuum
Monday, January 13, 2025
Meta's community notes may impact ad revenue... January 13, 2025 View Online | Sign Up | Shop Morning Brew Presented By Bland.AI Good morning. The world's largest gathering of humanity is
Technical snag forces another delay for the first orbital launch of Blue Origin’s New Glenn rocket
Monday, January 13, 2025
Breaking News from GeekWire GeekWire.com | View in browser Jeff Bezos' Blue Origin space venture counted down to the final hour tonight, but in the end, the company had to postpone the first-ever
Devs sent into security panic by 'feature that was helpful … until it wasn't' [Mon Jan 13 2025]
Monday, January 13, 2025
Hi The Register Subscriber | Log in The Register Daily Headlines 13 January 2025 Example of a spelling mistake Devs sent into security panic by 'feature that was helpful … until it wasn't'
Open Thread 364
Monday, January 13, 2025
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏