Astral Codex Ten - Mantic Monday 9/16/24
Probably No Superintelligent Forecaster YetFiveThirtyNine (ha ha) is a new forecasting AI that purports to be “superintelligent”, ie able to beat basically all human forecasters. In fact, its creators go further than that: they say it beats Metaculus, a site which aggregates the estimates of hundreds of forecasters to generate estimates more accurate than any of them. You can read the announcement here and play with the model itself here. (kudos to the team for making the model publicly available, especially since these things usually have high inference costs) The basic structure is the same as past forecasting AIs like FutureSearch. A heavily-modified copy of ChatGPT gathers relevant news articles, then prompts itself to think in superforecaster-like ways. The creators say the ChatGPT copy had a knowledge cutoff of October 2023, so they tested it on Metaculus questions from after that date. It got 87.7% accuracy, slightly above Metaculus forecasters’ 87.0%. Manifold is skeptical: The commenters, especially Neel Nanda, found that doing knowledge cutoffs properly is hard, and the ChatGPT base seems to know about news events after October 2023 - upon questioning, it seemed aware of an earthquake in November 2023. When presented with a different set of questions that were all after November 2023, FiveThirtyNine substantially underperformed the Metaculus average. But also, my attempts to play around with the bot haven’t been encouraging:
The FutureSearch team wrote a LessWrong post generalizing these kinds of observations, Contra Papers Claiming Superhuman AI Forecasting. They examine four claims, including the one above, and find similar problems with all of them. Sometimes the teams involved missed potential data contamination (ie their LLM wasn’t forecasting, it just already knew the answers). Other times the LLM failed but - in the spirit of technologists everywhere - the researchers invented finicky definitions of “above human level” by which even mediocre AIs qualified. They conclude:
Still, FiveThirtyNine is a big advance in at least one way: as far as I know, it’s the first high-quality AI forecaster which is free to the general public. Try it out! r/MarkMyWordsThis is a subreddit for people who want to record bold predictions. There’s nothing formal - nobody gives probabilities, and some of them don’t even have end dates. It’s just people going out on a limb to say they’re sure something will happen. …most of them are “mark my words, time will prove Democrats right about everything, and reveal Republicans to be disgusting criminal hypocrites”. …so much so that it kind of fails as a potentially interesting institution and becomes just another monument to how sad the Internet’s gotten. Still, it might be fun to keep going until you find an old post where the prediction has already “resolved”, and see what happens. Here are some of the highest-upvoted posts from at least a year ago (minus pop culture and dumb in-jokes):
…okay, that wasn’t fun or interesting either. Also, it’s really hard (there are a lot more new posts than old ones). But I bet it’ll be fun to try the same thing a year or so after the election. Polymarket Is Rolling In CashWe talk about a lot of topics here. AI forecasters. Brier scores. Fixing science. But the average person is in forecasting for one thing: betting on presidential elections. Here’s Polymarket’s volume (in dollars bet) over time (source): Some of this is no doubt due to the hard work of Shayne and his team improving the site. But let’s be honest. It’s mostly because people really want to bet money on Trump/Harris 2024. The presidential market has a total volume of $910 million, far above eg markets about the Superbowl ($50 million), the World Series ($5 million), and the bird flu epidemic ($141,000). Even a 1% fee on all this trading would make Polymarket a lot of money. But they . . . don’t really seem to charge fees? According to Forbes (paywalled):
They’re rolling in money, it’s just not their money. Yet. Still, it’s hard to overstate their dominance. Remember, their presidential election market has $910 million. For their competitor, PredictIt, the same number is $37 million. Kalshi doesn’t have election bets (more on this later) but their biggest markets look to be in the $2 - $5 million range. Along with the cash, they’re collecting prestige and endorsements. Nate Silver recently joined their advisory board. And their Substack newsletter is lots of fun: I don’t talk about Polymarket much because they’re not doing anything too far-out or experimental. They don’t have the strongest accuracy track record, and they don’t have the most diverse markets. Still, they’ve carried out their fundamentals really well, with great UI, market making, and ability to navigate legal storms. From a business perspective, they’re the standout winners of the early 2020s bumper crop of prediction markets. This Month In The Markets1: You knew it was coming: See also various slightly-weaker or slightly-stronger versions of the same question (includes wildlife, includes any immigrants, includes only Springfield). I actually appreciate this a lot, because most of the debate around Catgate has focused on how there’s “no evidence” it’s happening, but “no evidence” is cheap and I prefer an outright forecast. 2: Why did this go down so much in April 2024? 3: I originally thought this was about Strawberry, but the timing is wrong: it’s a Google DeepMind AI that got just short of the gold threshold back in July. People seemed genuinely surprised by this! 4: 5: I hadn’t even heard of this theory before; you can learn more here: 6: Finally, prediction markets returning to their roots: 7: Forecasting Links1: Trouble in England as politicians are accused of betting on political topics. In July, some MPs bet on when an election would be held; during the election, one bet £8,000 that he would lose his seat (he did). It’s illegal for people with nonpublic information to bet on political topics, but so far nobody is formally accusing the people involved of having nonpublic information. And the sums involved (£100 for one of the most scandalous election bets) suggests these aren’t exactly grand schemes. I file this under “need to avoid appearance of impropriety” more than “criminal mastermind”. 2: Dean Ball has a sort of vague vision of LLMs betting on prediction markets at massive scale. I agree something like this is interesting and plausible; I agree that it’s hard to pin down exactly how it would work. One suggestion he makes is to have the bots shadow public intellectuals - for example, a bot “trained on” my writing would ask itself “how would Scott Alexander bet in this market?”, and if it made more money than a bot asking “how would Tyler Cowan bet in this market?”, then maybe you would trust me more than Tyler. This is cute but there are a lot of wrinkles to work out For example, I talk more about superforecasting and probability calibration than Tyler, my bot might simulate me by making good bets; if Tyler sometimes uses extreme or ideological language, his bot might make worse bets not because his ideas are worse, but because it “simulates” him as being an incautious better. 3: Kalshi vs. CFTC, round one million: after CFTC banned Kalshi from hosting political contracts last year, Kalshi appealed. Earlier this month, the judge sided with Kalshi, saying that the CFTC’s attempt to define elections as “gaming” so it can regulate them under anti-gaming laws is an illegal power grab. The judge claims this has no relevance to the CFTC’s broader anti-political-market push, but since the whole thing is based on the elections = gaming theory I think it has a lot of relevance indeed. The CFTC has since appealed, and Kalshi is blocked from hosting the contracts until the appeal goes through (it’s 49 days until the election; at this point even a pro-Kalshi ruling might be a Pyrrhic victory). Also, why is Kalshi trying to get Congress contracts up, but not a Presidency contract? More sympathetic test case? You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Open Thread 347
Monday, September 16, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Berkeley Meetup This Saturday
Saturday, September 14, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Your Book Review: Nine Lives
Friday, September 13, 2024
Finalist #13 in the Book Review Contest ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Finish signing in to Astral Codex Ten
Thursday, September 12, 2024
Here's a link to sign in to Astral Codex Ten. This link can only be used once and expires in one hour. If expired, please try signing in again here. Sign in now © 2024 Scott Alexander 548 Market
Links For September 2024
Thursday, September 12, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Open Thread 361
Monday, December 23, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Automattic vs WP Engine WordPress wars are getting really annoying [Mon Dec 23 2024]
Monday, December 23, 2024
Hi The Register Subscriber | Log in The Register Daily Headlines 23 December 2024 wordpress The Automattic vs WP Engine WordPress wars are getting really annoying Forks at dawn.... but it's not
In Waning Senate Days, Kyrsten Sinema Screwed Workers and Spent Campaign Cash on Stay at French Castle
Monday, December 23, 2024
The Arizona senator's prodigious campaign spending in global wine hot spots can't possibly be related to the campaign she's not running, says an ethics complaint. Most Read Indiana's
Monday Briefing: Al-Assad’s final days in power
Sunday, December 22, 2024
Plus, tell us about your most successful New Year's resolution. View in browser|nytimes.com Ad Morning Briefing: Asia Pacific Edition December 23, 2024 Author Headshot By Gaya Gupta Good morning.
Gift of the Day: For the (Battery) One-Percenters
Sunday, December 22, 2024
“They'll never have to turn on low power mode again.” The Strategist Gifts Every product is independently selected by editors. If you buy something through our links, New York may earn an affiliate
GeekWire's Most-Read Stories of the Week
Sunday, December 22, 2024
Catch up on the top tech stories from this past week. Here are the headlines that people have been reading on GeekWire. ADVERTISEMENT GeekWire SPONSOR MESSAGE: Improve focus and memory with Thinkie:
Speckled Curiosa
Sunday, December 22, 2024
Today, enjoy our audio and video picks. Speckled Curiosa By Caroline Crampton • 22 Dec 2024 View in browser View in browser The full Browser recommends five articles, a video and a podcast. Today,
10 Things That Delighted Us Last Week: From Gap’s CashSoft to Airplane Footrests
Sunday, December 22, 2024
Plus: A design-y divider to make room for guests in small spaces. The Strategist Logo Every product is independently selected by editors. If you buy something through our links, New York may earn an
LEVER WEEKLY: Nurses And Other Superheroes
Sunday, December 22, 2024
Financial technology startups could ruin Christmas and more from The Lever this week. Nurses And Other Superheroes By The Lever • 22 Dec 2024 View in browser View in browser This is Lever Weekly, a
The Sunday — December 22
Sunday, December 22, 2024
This is the Tangle Sunday Edition, a brief roundup of our independent politics coverage plus some extra features for your Sunday morning reading. Our Sunday newsletter is typically a feature for