Astral Codex Ten - Mantic Monday 9/16/24
Probably No Superintelligent Forecaster YetFiveThirtyNine (ha ha) is a new forecasting AI that purports to be “superintelligent”, ie able to beat basically all human forecasters. In fact, its creators go further than that: they say it beats Metaculus, a site which aggregates the estimates of hundreds of forecasters to generate estimates more accurate than any of them. You can read the announcement here and play with the model itself here. (kudos to the team for making the model publicly available, especially since these things usually have high inference costs) The basic structure is the same as past forecasting AIs like FutureSearch. A heavily-modified copy of ChatGPT gathers relevant news articles, then prompts itself to think in superforecaster-like ways. The creators say the ChatGPT copy had a knowledge cutoff of October 2023, so they tested it on Metaculus questions from after that date. It got 87.7% accuracy, slightly above Metaculus forecasters’ 87.0%. Manifold is skeptical: The commenters, especially Neel Nanda, found that doing knowledge cutoffs properly is hard, and the ChatGPT base seems to know about news events after October 2023 - upon questioning, it seemed aware of an earthquake in November 2023. When presented with a different set of questions that were all after November 2023, FiveThirtyNine substantially underperformed the Metaculus average. But also, my attempts to play around with the bot haven’t been encouraging:
The FutureSearch team wrote a LessWrong post generalizing these kinds of observations, Contra Papers Claiming Superhuman AI Forecasting. They examine four claims, including the one above, and find similar problems with all of them. Sometimes the teams involved missed potential data contamination (ie their LLM wasn’t forecasting, it just already knew the answers). Other times the LLM failed but - in the spirit of technologists everywhere - the researchers invented finicky definitions of “above human level” by which even mediocre AIs qualified. They conclude:
Still, FiveThirtyNine is a big advance in at least one way: as far as I know, it’s the first high-quality AI forecaster which is free to the general public. Try it out! r/MarkMyWordsThis is a subreddit for people who want to record bold predictions. There’s nothing formal - nobody gives probabilities, and some of them don’t even have end dates. It’s just people going out on a limb to say they’re sure something will happen. …most of them are “mark my words, time will prove Democrats right about everything, and reveal Republicans to be disgusting criminal hypocrites”. …so much so that it kind of fails as a potentially interesting institution and becomes just another monument to how sad the Internet’s gotten. Still, it might be fun to keep going until you find an old post where the prediction has already “resolved”, and see what happens. Here are some of the highest-upvoted posts from at least a year ago (minus pop culture and dumb in-jokes):
…okay, that wasn’t fun or interesting either. Also, it’s really hard (there are a lot more new posts than old ones). But I bet it’ll be fun to try the same thing a year or so after the election. Polymarket Is Rolling In CashWe talk about a lot of topics here. AI forecasters. Brier scores. Fixing science. But the average person is in forecasting for one thing: betting on presidential elections. Here’s Polymarket’s volume (in dollars bet) over time (source): Some of this is no doubt due to the hard work of Shayne and his team improving the site. But let’s be honest. It’s mostly because people really want to bet money on Trump/Harris 2024. The presidential market has a total volume of $910 million, far above eg markets about the Superbowl ($50 million), the World Series ($5 million), and the bird flu epidemic ($141,000). Even a 1% fee on all this trading would make Polymarket a lot of money. But they . . . don’t really seem to charge fees? According to Forbes (paywalled):
They’re rolling in money, it’s just not their money. Yet. Still, it’s hard to overstate their dominance. Remember, their presidential election market has $910 million. For their competitor, PredictIt, the same number is $37 million. Kalshi doesn’t have election bets (more on this later) but their biggest markets look to be in the $2 - $5 million range. Along with the cash, they’re collecting prestige and endorsements. Nate Silver recently joined their advisory board. And their Substack newsletter is lots of fun: I don’t talk about Polymarket much because they’re not doing anything too far-out or experimental. They don’t have the strongest accuracy track record, and they don’t have the most diverse markets. Still, they’ve carried out their fundamentals really well, with great UI, market making, and ability to navigate legal storms. From a business perspective, they’re the standout winners of the early 2020s bumper crop of prediction markets. This Month In The Markets1: You knew it was coming: See also various slightly-weaker or slightly-stronger versions of the same question (includes wildlife, includes any immigrants, includes only Springfield). I actually appreciate this a lot, because most of the debate around Catgate has focused on how there’s “no evidence” it’s happening, but “no evidence” is cheap and I prefer an outright forecast. 2: Why did this go down so much in April 2024? 3: I originally thought this was about Strawberry, but the timing is wrong: it’s a Google DeepMind AI that got just short of the gold threshold back in July. People seemed genuinely surprised by this! 4: 5: I hadn’t even heard of this theory before; you can learn more here: 6: Finally, prediction markets returning to their roots: 7: Forecasting Links1: Trouble in England as politicians are accused of betting on political topics. In July, some MPs bet on when an election would be held; during the election, one bet £8,000 that he would lose his seat (he did). It’s illegal for people with nonpublic information to bet on political topics, but so far nobody is formally accusing the people involved of having nonpublic information. And the sums involved (£100 for one of the most scandalous election bets) suggests these aren’t exactly grand schemes. I file this under “need to avoid appearance of impropriety” more than “criminal mastermind”. 2: Dean Ball has a sort of vague vision of LLMs betting on prediction markets at massive scale. I agree something like this is interesting and plausible; I agree that it’s hard to pin down exactly how it would work. One suggestion he makes is to have the bots shadow public intellectuals - for example, a bot “trained on” my writing would ask itself “how would Scott Alexander bet in this market?”, and if it made more money than a bot asking “how would Tyler Cowan bet in this market?”, then maybe you would trust me more than Tyler. This is cute but there are a lot of wrinkles to work out For example, I talk more about superforecasting and probability calibration than Tyler, my bot might simulate me by making good bets; if Tyler sometimes uses extreme or ideological language, his bot might make worse bets not because his ideas are worse, but because it “simulates” him as being an incautious better. 3: Kalshi vs. CFTC, round one million: after CFTC banned Kalshi from hosting political contracts last year, Kalshi appealed. Earlier this month, the judge sided with Kalshi, saying that the CFTC’s attempt to define elections as “gaming” so it can regulate them under anti-gaming laws is an illegal power grab. The judge claims this has no relevance to the CFTC’s broader anti-political-market push, but since the whole thing is based on the elections = gaming theory I think it has a lot of relevance indeed. The CFTC has since appealed, and Kalshi is blocked from hosting the contracts until the appeal goes through (it’s 49 days until the election; at this point even a pro-Kalshi ruling might be a Pyrrhic victory). Also, why is Kalshi trying to get Congress contracts up, but not a Presidency contract? More sympathetic test case? You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Open Thread 347
Monday, September 16, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Berkeley Meetup This Saturday
Saturday, September 14, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Your Book Review: Nine Lives
Friday, September 13, 2024
Finalist #13 in the Book Review Contest ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Finish signing in to Astral Codex Ten
Thursday, September 12, 2024
Here's a link to sign in to Astral Codex Ten. This link can only be used once and expires in one hour. If expired, please try signing in again here. Sign in now © 2024 Scott Alexander 548 Market
Links For September 2024
Thursday, September 12, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
234474 is your Substack verification code
Saturday, March 1, 2025
Here's your verification code to sign in to Substack: 234474 This code will only be valid for the next 10 minutes. If the code does not work, you can use this login verification link: Verify email
790484 is your Substack verification code
Saturday, March 1, 2025
Here's your verification code to sign in to Substack: 790484 This code will only be valid for the next 10 minutes. If the code does not work, you can use this login verification link: Verify email
Have We All Just Agreed to Live With Soul-Crushing Racism?
Saturday, March 1, 2025
February 28, 2025 THE SYSTEM Have We All Just Agreed to Live With Soul-Crushing Racism? By Zak Cheney-Rice Elon Musk throwing up a Nazi-style salute on Trump's Inauguration Day. Photo: Mark
342612 is your Substack verification code
Friday, February 28, 2025
Here's your verification code to sign in to Substack: 342612 This code will only be valid for the next 10 minutes. If the code does not work, you can use this login verification link: Verify email
What A Day: Vodka shots fired
Friday, February 28, 2025
Did American support for Ukraine's war with Russia just melt down on live TV? It sure looks that way… and Putin's pals are “already on their seventh vodka toast.” ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Friday Sales: Half-Off Eberjey and $99 Salomons
Friday, February 28, 2025
Including woven Merrells and a colorful Hydro Flask. The Strategist Every product is independently selected by editors. If you buy something through our links, New York may earn an affiliate commission
Google sets long-term plan to exit Seattle’s Fremont neighborhood, consolidate in South Lake Union
Friday, February 28, 2025
Breaking News from GeekWire GeekWire.com | View in browser Google confirmed Friday that the company plans to bring all its employees in Seattle together at its South Lake Union campus, citing a desire
Miniature Donkey, Father-Daughter Dance, and a Baby Rescue
Friday, February 28, 2025
Seamus, a five-month-old miniature donkey in Canada, is being trained as a therapy animal to provide comfort to those in need. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Finally, some good news
Friday, February 28, 2025
Plus: sugar daddies and mommies, Instagram reels, and more. Each week, a different Vox editor curates their favorite work that Vox has published across text, audio, and video. This week's
It’s a great moment for startups — with one caveat | Microsoft retiring Skype
Friday, February 28, 2025
Meet the new leader of Alliance of Angels | Amazon commits $100M to Bellevue for housing ADVERTISEMENT GeekWire SPONSOR MESSAGE: SEA Airport Is Moving from Now to WOW!: Take a virtual tour of