Astral Codex Ten - Mantic Monday 9/16/24
Probably No Superintelligent Forecaster YetFiveThirtyNine (ha ha) is a new forecasting AI that purports to be “superintelligent”, ie able to beat basically all human forecasters. In fact, its creators go further than that: they say it beats Metaculus, a site which aggregates the estimates of hundreds of forecasters to generate estimates more accurate than any of them. You can read the announcement here and play with the model itself here. (kudos to the team for making the model publicly available, especially since these things usually have high inference costs) The basic structure is the same as past forecasting AIs like FutureSearch. A heavily-modified copy of ChatGPT gathers relevant news articles, then prompts itself to think in superforecaster-like ways. The creators say the ChatGPT copy had a knowledge cutoff of October 2023, so they tested it on Metaculus questions from after that date. It got 87.7% accuracy, slightly above Metaculus forecasters’ 87.0%. Manifold is skeptical: The commenters, especially Neel Nanda, found that doing knowledge cutoffs properly is hard, and the ChatGPT base seems to know about news events after October 2023 - upon questioning, it seemed aware of an earthquake in November 2023. When presented with a different set of questions that were all after November 2023, FiveThirtyNine substantially underperformed the Metaculus average. But also, my attempts to play around with the bot haven’t been encouraging:
The FutureSearch team wrote a LessWrong post generalizing these kinds of observations, Contra Papers Claiming Superhuman AI Forecasting. They examine four claims, including the one above, and find similar problems with all of them. Sometimes the teams involved missed potential data contamination (ie their LLM wasn’t forecasting, it just already knew the answers). Other times the LLM failed but - in the spirit of technologists everywhere - the researchers invented finicky definitions of “above human level” by which even mediocre AIs qualified. They conclude:
Still, FiveThirtyNine is a big advance in at least one way: as far as I know, it’s the first high-quality AI forecaster which is free to the general public. Try it out! r/MarkMyWordsThis is a subreddit for people who want to record bold predictions. There’s nothing formal - nobody gives probabilities, and some of them don’t even have end dates. It’s just people going out on a limb to say they’re sure something will happen. …most of them are “mark my words, time will prove Democrats right about everything, and reveal Republicans to be disgusting criminal hypocrites”. …so much so that it kind of fails as a potentially interesting institution and becomes just another monument to how sad the Internet’s gotten. Still, it might be fun to keep going until you find an old post where the prediction has already “resolved”, and see what happens. Here are some of the highest-upvoted posts from at least a year ago (minus pop culture and dumb in-jokes):
…okay, that wasn’t fun or interesting either. Also, it’s really hard (there are a lot more new posts than old ones). But I bet it’ll be fun to try the same thing a year or so after the election. Polymarket Is Rolling In CashWe talk about a lot of topics here. AI forecasters. Brier scores. Fixing science. But the average person is in forecasting for one thing: betting on presidential elections. Here’s Polymarket’s volume (in dollars bet) over time (source): Some of this is no doubt due to the hard work of Shayne and his team improving the site. But let’s be honest. It’s mostly because people really want to bet money on Trump/Harris 2024. The presidential market has a total volume of $910 million, far above eg markets about the Superbowl ($50 million), the World Series ($5 million), and the bird flu epidemic ($141,000). Even a 1% fee on all this trading would make Polymarket a lot of money. But they . . . don’t really seem to charge fees? According to Forbes (paywalled):
They’re rolling in money, it’s just not their money. Yet. Still, it’s hard to overstate their dominance. Remember, their presidential election market has $910 million. For their competitor, PredictIt, the same number is $37 million. Kalshi doesn’t have election bets (more on this later) but their biggest markets look to be in the $2 - $5 million range. Along with the cash, they’re collecting prestige and endorsements. Nate Silver recently joined their advisory board. And their Substack newsletter is lots of fun: I don’t talk about Polymarket much because they’re not doing anything too far-out or experimental. They don’t have the strongest accuracy track record, and they don’t have the most diverse markets. Still, they’ve carried out their fundamentals really well, with great UI, market making, and ability to navigate legal storms. From a business perspective, they’re the standout winners of the early 2020s bumper crop of prediction markets. This Month In The Markets1: You knew it was coming: See also various slightly-weaker or slightly-stronger versions of the same question (includes wildlife, includes any immigrants, includes only Springfield). I actually appreciate this a lot, because most of the debate around Catgate has focused on how there’s “no evidence” it’s happening, but “no evidence” is cheap and I prefer an outright forecast. 2: Why did this go down so much in April 2024? 3: I originally thought this was about Strawberry, but the timing is wrong: it’s a Google DeepMind AI that got just short of the gold threshold back in July. People seemed genuinely surprised by this! 4: 5: I hadn’t even heard of this theory before; you can learn more here: 6: Finally, prediction markets returning to their roots: 7: Forecasting Links1: Trouble in England as politicians are accused of betting on political topics. In July, some MPs bet on when an election would be held; during the election, one bet £8,000 that he would lose his seat (he did). It’s illegal for people with nonpublic information to bet on political topics, but so far nobody is formally accusing the people involved of having nonpublic information. And the sums involved (£100 for one of the most scandalous election bets) suggests these aren’t exactly grand schemes. I file this under “need to avoid appearance of impropriety” more than “criminal mastermind”. 2: Dean Ball has a sort of vague vision of LLMs betting on prediction markets at massive scale. I agree something like this is interesting and plausible; I agree that it’s hard to pin down exactly how it would work. One suggestion he makes is to have the bots shadow public intellectuals - for example, a bot “trained on” my writing would ask itself “how would Scott Alexander bet in this market?”, and if it made more money than a bot asking “how would Tyler Cowan bet in this market?”, then maybe you would trust me more than Tyler. This is cute but there are a lot of wrinkles to work out For example, I talk more about superforecasting and probability calibration than Tyler, my bot might simulate me by making good bets; if Tyler sometimes uses extreme or ideological language, his bot might make worse bets not because his ideas are worse, but because it “simulates” him as being an incautious better. 3: Kalshi vs. CFTC, round one million: after CFTC banned Kalshi from hosting political contracts last year, Kalshi appealed. Earlier this month, the judge sided with Kalshi, saying that the CFTC’s attempt to define elections as “gaming” so it can regulate them under anti-gaming laws is an illegal power grab. The judge claims this has no relevance to the CFTC’s broader anti-political-market push, but since the whole thing is based on the elections = gaming theory I think it has a lot of relevance indeed. The CFTC has since appealed, and Kalshi is blocked from hosting the contracts until the appeal goes through (it’s 49 days until the election; at this point even a pro-Kalshi ruling might be a Pyrrhic victory). Also, why is Kalshi trying to get Congress contracts up, but not a Presidency contract? More sympathetic test case? You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Open Thread 347
Monday, September 16, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Berkeley Meetup This Saturday
Saturday, September 14, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Your Book Review: Nine Lives
Friday, September 13, 2024
Finalist #13 in the Book Review Contest ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Finish signing in to Astral Codex Ten
Thursday, September 12, 2024
Here's a link to sign in to Astral Codex Ten. This link can only be used once and expires in one hour. If expired, please try signing in again here. Sign in now © 2024 Scott Alexander 548 Market
Links For September 2024
Thursday, September 12, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Benefits and drawbacks of Amazon’s return to office | Microsoft revives nuclear reactor to power data centers
Friday, September 20, 2024
Group14 lands $200M to build battery materials factory | OfferUp expands to home services ADVERTISEMENT GeekWire SPONSOR MESSAGE: Get your ticket for AWS re:Invent, happening Dec. 2–6 in Las Vegas:
The TikTok reality TV series invading the sidewalks of NYC
Friday, September 20, 2024
PLUS: Some of the best journalists on YouTube are former Vox employees. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
“At long last, here is a bra that doesn’t make me feel bad”
Friday, September 20, 2024
Plus, another bra we love is on sale View in browser The Recommendation “This 'ugly' bra is my ride-or-die” A person wearing a nude-colored bra. Photo: Soma It was sometime in the mid- to late-
☕ Force of another color
Friday, September 20, 2024
Pantone's “dualities palette.” September 20, 2024 Retail Brew Hello, and happy Friday, although perhaps a little less so for retiring Nike CEO John Donahoe. However, if Elliott Hill's return as
Your Book Review: The Ballad of the White Horse
Friday, September 20, 2024
Finalist #14 in the Book Review Contest ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Trump vs. Harris on the issues (Part 1).
Friday, September 20, 2024
First, the economy, immigration, health care, and abortion. Trump vs. Harris on the issues (Part 1). First, the economy, immigration, health care, and abortion. By Isaac Saul & 4 others • 20 Sept
What’s on your mind?
Friday, September 20, 2024
A new series that tackles your questions. Each week, a different Vox editor curates their favorite work that Vox has published across text, audio, and video. This week's recommendations are brought
The Greatest Movie Trailer of All Time?
Friday, September 20, 2024
View in your browser Twitter Facebook Instagram Share | Subscribe The Ringer September 20, 2024 This week, we're talking Marvel villains, hot dogs, and movie trailers. Television Getty Images/
Stories That Thrill, 50 Storytelling Tools You Need and Unpacking Vinyl's Revival
Friday, September 20, 2024
10 stories that have given us creative inspiration this week
Right Partings
Friday, September 20, 2024
Was Ozempic Right For Me? // Meetings And Partings Right Partings By Caroline Crampton • 20 Sept 2024 View in browser View in browser Was Ozempic Right For Me? Em Win | Autostraddle | 6th August 2024