Astral Codex Ten - Mantic Monday 9/16/24
Probably No Superintelligent Forecaster YetFiveThirtyNine (ha ha) is a new forecasting AI that purports to be “superintelligent”, ie able to beat basically all human forecasters. In fact, its creators go further than that: they say it beats Metaculus, a site which aggregates the estimates of hundreds of forecasters to generate estimates more accurate than any of them. You can read the announcement here and play with the model itself here. (kudos to the team for making the model publicly available, especially since these things usually have high inference costs) The basic structure is the same as past forecasting AIs like FutureSearch. A heavily-modified copy of ChatGPT gathers relevant news articles, then prompts itself to think in superforecaster-like ways. The creators say the ChatGPT copy had a knowledge cutoff of October 2023, so they tested it on Metaculus questions from after that date. It got 87.7% accuracy, slightly above Metaculus forecasters’ 87.0%. Manifold is skeptical: The commenters, especially Neel Nanda, found that doing knowledge cutoffs properly is hard, and the ChatGPT base seems to know about news events after October 2023 - upon questioning, it seemed aware of an earthquake in November 2023. When presented with a different set of questions that were all after November 2023, FiveThirtyNine substantially underperformed the Metaculus average. But also, my attempts to play around with the bot haven’t been encouraging:
The FutureSearch team wrote a LessWrong post generalizing these kinds of observations, Contra Papers Claiming Superhuman AI Forecasting. They examine four claims, including the one above, and find similar problems with all of them. Sometimes the teams involved missed potential data contamination (ie their LLM wasn’t forecasting, it just already knew the answers). Other times the LLM failed but - in the spirit of technologists everywhere - the researchers invented finicky definitions of “above human level” by which even mediocre AIs qualified. They conclude:
Still, FiveThirtyNine is a big advance in at least one way: as far as I know, it’s the first high-quality AI forecaster which is free to the general public. Try it out! r/MarkMyWordsThis is a subreddit for people who want to record bold predictions. There’s nothing formal - nobody gives probabilities, and some of them don’t even have end dates. It’s just people going out on a limb to say they’re sure something will happen. …most of them are “mark my words, time will prove Democrats right about everything, and reveal Republicans to be disgusting criminal hypocrites”. …so much so that it kind of fails as a potentially interesting institution and becomes just another monument to how sad the Internet’s gotten. Still, it might be fun to keep going until you find an old post where the prediction has already “resolved”, and see what happens. Here are some of the highest-upvoted posts from at least a year ago (minus pop culture and dumb in-jokes):
…okay, that wasn’t fun or interesting either. Also, it’s really hard (there are a lot more new posts than old ones). But I bet it’ll be fun to try the same thing a year or so after the election. Polymarket Is Rolling In CashWe talk about a lot of topics here. AI forecasters. Brier scores. Fixing science. But the average person is in forecasting for one thing: betting on presidential elections. Here’s Polymarket’s volume (in dollars bet) over time (source): Some of this is no doubt due to the hard work of Shayne and his team improving the site. But let’s be honest. It’s mostly because people really want to bet money on Trump/Harris 2024. The presidential market has a total volume of $910 million, far above eg markets about the Superbowl ($50 million), the World Series ($5 million), and the bird flu epidemic ($141,000). Even a 1% fee on all this trading would make Polymarket a lot of money. But they . . . don’t really seem to charge fees? According to Forbes (paywalled):
They’re rolling in money, it’s just not their money. Yet. Still, it’s hard to overstate their dominance. Remember, their presidential election market has $910 million. For their competitor, PredictIt, the same number is $37 million. Kalshi doesn’t have election bets (more on this later) but their biggest markets look to be in the $2 - $5 million range. Along with the cash, they’re collecting prestige and endorsements. Nate Silver recently joined their advisory board. And their Substack newsletter is lots of fun: I don’t talk about Polymarket much because they’re not doing anything too far-out or experimental. They don’t have the strongest accuracy track record, and they don’t have the most diverse markets. Still, they’ve carried out their fundamentals really well, with great UI, market making, and ability to navigate legal storms. From a business perspective, they’re the standout winners of the early 2020s bumper crop of prediction markets. This Month In The Markets1: You knew it was coming: See also various slightly-weaker or slightly-stronger versions of the same question (includes wildlife, includes any immigrants, includes only Springfield). I actually appreciate this a lot, because most of the debate around Catgate has focused on how there’s “no evidence” it’s happening, but “no evidence” is cheap and I prefer an outright forecast. 2: Why did this go down so much in April 2024? 3: I originally thought this was about Strawberry, but the timing is wrong: it’s a Google DeepMind AI that got just short of the gold threshold back in July. People seemed genuinely surprised by this! 4: 5: I hadn’t even heard of this theory before; you can learn more here: 6: Finally, prediction markets returning to their roots: 7: Forecasting Links1: Trouble in England as politicians are accused of betting on political topics. In July, some MPs bet on when an election would be held; during the election, one bet £8,000 that he would lose his seat (he did). It’s illegal for people with nonpublic information to bet on political topics, but so far nobody is formally accusing the people involved of having nonpublic information. And the sums involved (£100 for one of the most scandalous election bets) suggests these aren’t exactly grand schemes. I file this under “need to avoid appearance of impropriety” more than “criminal mastermind”. 2: Dean Ball has a sort of vague vision of LLMs betting on prediction markets at massive scale. I agree something like this is interesting and plausible; I agree that it’s hard to pin down exactly how it would work. One suggestion he makes is to have the bots shadow public intellectuals - for example, a bot “trained on” my writing would ask itself “how would Scott Alexander bet in this market?”, and if it made more money than a bot asking “how would Tyler Cowan bet in this market?”, then maybe you would trust me more than Tyler. This is cute but there are a lot of wrinkles to work out For example, I talk more about superforecasting and probability calibration than Tyler, my bot might simulate me by making good bets; if Tyler sometimes uses extreme or ideological language, his bot might make worse bets not because his ideas are worse, but because it “simulates” him as being an incautious better. 3: Kalshi vs. CFTC, round one million: after CFTC banned Kalshi from hosting political contracts last year, Kalshi appealed. Earlier this month, the judge sided with Kalshi, saying that the CFTC’s attempt to define elections as “gaming” so it can regulate them under anti-gaming laws is an illegal power grab. The judge claims this has no relevance to the CFTC’s broader anti-political-market push, but since the whole thing is based on the elections = gaming theory I think it has a lot of relevance indeed. The CFTC has since appealed, and Kalshi is blocked from hosting the contracts until the appeal goes through (it’s 49 days until the election; at this point even a pro-Kalshi ruling might be a Pyrrhic victory). Also, why is Kalshi trying to get Congress contracts up, but not a Presidency contract? More sympathetic test case? You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Open Thread 347
Monday, September 16, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Berkeley Meetup This Saturday
Saturday, September 14, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Your Book Review: Nine Lives
Friday, September 13, 2024
Finalist #13 in the Book Review Contest ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Finish signing in to Astral Codex Ten
Thursday, September 12, 2024
Here's a link to sign in to Astral Codex Ten. This link can only be used once and expires in one hour. If expired, please try signing in again here. Sign in now © 2024 Scott Alexander 548 Market
Links For September 2024
Thursday, September 12, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
AWS re:Invent returns: Get tickets today to stay in front of the biggest tech trends
Thursday, November 21, 2024
Get the latest product launches & announcements, and join over 2.5k sessions this Dec. 2–6 in Las Vegas GeekWire is pleased to present this special sponsored message to our Pacific NW readers. Don
Friday Briefing: Gaetz ends attorney general bid
Thursday, November 21, 2024
Plus, bathing in crude oil. View in browser|nytimes.com Ad Morning Briefing: Asia Pacific Edition November 22, 2024 Author Headshot By Gaya Gupta Good morning. We're covering a setback in Donald
Friday Briefing: Gaetz ends attorney general bid
Thursday, November 21, 2024
Plus, bathing in crude oil. View in browser|nytimes.com Ad Morning Briefing: Asia Pacific Edition November 22, 2024 Author Headshot By Gaya Gupta Good morning. We're covering a setback in Donald
We Think Alike
Thursday, November 21, 2024
What We Agree On, The Fall of Chrome ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Fast-growing startups rise up the GeekWire 200 list | Bezos and Musk spar on social media
Thursday, November 21, 2024
EV industry under Trump | WTIA celebrates 40th anniversary ADVERTISEMENT GeekWire SPONSOR MESSAGE: Get your ticket for AWS re:Invent, happening Dec. 2–6 in Las Vegas: Register now for AWS re:Invent.
Be Prepared: Bill O'Reilly Offers Insights on How to Safeguard Your Finances During Recession
Thursday, November 21, 2024
Ready to explore? Learn a new language with Babbel—now 55% off! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
☕ Circle of influence
Thursday, November 21, 2024
Lina Khan's lasting influence on the advertising industry. November 21, 2024 Marketing Brew Presented By Wistia Today is Thursday. Comcast announced a $7 billion spinoff of NBCUniversal cable
☕ Lessons earned
Thursday, November 21, 2024
Walmart's impressive Q3 earnings results. November 21, 2024 Retail Brew Presented By Uptempo Hi, it's Thursday, and TJX Companies, which runs TJ Maxx, Marshalls, and Home Goods stores, said its
What I’ve learned working here
Thursday, November 21, 2024
Support curiosity and a better-informed world
The Matt Gaetz ethics report.
Thursday, November 21, 2024
Plus, a reader question about Charlie Kirk. The Matt Gaetz ethics report. Plus, a reader question about Charlie Kirk. By Isaac Saul • 21 Nov 2024 View in browser View in browser Rep. Matt Gaetz (R-FL)