Astral Codex Ten - Mantic Monday 2/19/24
Mantic Monday 2/19/24AI forecasters come of age || Prediction market reality TV dating show? || OpenAI SoraManifold has lots of bots. There’s a Silicon League entirely for bots. Lots of bots make lots of money: Most of these bots are boring. They’re bots programmed to automatically buy some market once the price gets low enough, or to arbitrate basically-identical markets, or do some other technical finance maneuver. But you could imagine more interesting bots. Ones that forecasts the same way humans forecast. You could imagine a bot based on ChatGPT that asks “What is the probability of a cease-fire in Ukraine this year?” and bets on ChatGPT’s answer. And by “you could imagine” I mean “there’s now a Humans Vs. Bots tournament on Manifold with an ℳ250,000 prize” Let’s see how they’re doing: All of these bots seem to be making small profits, with GPT in the lead. But what’s this? The Nermit bot is based on FutureSearch.ai, a new company trying to build an AI-based forecaster. Based on their own internal calculations, they claim success: How is this¹ possible? Some studies of superforecasters converge on the same technique: figure out a base rate for some event, then alter it based on the current situation. For example, if you wanted to know the chance of a cease-fire in Ukraine over the next year, you might start by plotting the distribution of war lengths over the past century, then check how many wars that had lasted at least two years had a cease-fire in the third. Then you might adjust a little bit down for factors like “there haven’t been any promising peace talks yet” and “the two sides seem equally balanced”. FutureSearch’s AI tries to do something similar. It prompts itself with questions like
This doesn’t sound to me like it would work. But FutureSearch says it does. They let me test their AI model. I tried four questions:
All of these seem like plausible answers except the on Prospera, which I would have placed more like 5%. Here’s what the AI has to say for itself: And here’s an example of what’s doing inside one of these subpoints (in this case, subpoint one about special economic zones with significant foreign investment): 55% of SEZs with significant investment reached population 10,000 after seven years, so that’s its base rate for this method. On the one hand, this is very impressive. It knows what Prospera is. It correctly identified the main factor threatening its growth (the ongoing legal case with Honduras). It came up with some really interesting base rates, calculated them, and weighted them for relevance. All of this is great. But I don’t see it doing what would be the obvious first step for me, which is finding the current population of Prospera and how fast it’s growing. Plausibly this is because there are no news stories about this, and you sort of have to estimate it by following how many apartments are being built there and how much is happening (my guess is about low-to-mid-three-digits current population, and maybe high-three-to-low-four digits by 2026). I talked to a FutureSearch representative, who agreed that it dropped the ball on this question, but pointed out that if prompted slightly differently, it gives a better answer:
This seems right (except that my confidence interval would have included zero because there’s a 20%+ chance that it gets shut down). So overall its forecasts seem pretty impressive. But I was concerned by its reasoning even in some of the questions it got “right”. For example, the Nikki Haley question tried to get a base rate by asking what percent of elections Haley had won before, and found she had won 71% of them - these were mostly elections for South Carolina governor. You can see what the AI is trying to do - but it’s not going to work. Then it got confused and read a lot of news stories about how she’s currently losing the 2024 presidential election, and seemed to think they were about 2028. So either the AI only got a reasonable probability by coincidence, or it was testing many different strategies, throwing out the useless ones, and updating only on the useful ones, in a way that was kind of opaque to the casual reader. Still, if the company says it beats most human forecasters, this doesn’t seem totally impossible based on what I’ve seen. And that would be exciting! An AI that can generate probabilistic forecasts for any question seems like in some way a culmination of the rationalist project. And if you can make something like this work, it doesn’t sound too outlandish that you could apply the same AI to conditional forecasts, or to questions about the past and present (eg whether COVID was a lab leak). I would be most excited if at some point this graduated from its geopolitical focus and was able to answer questions on any topic (eg “what is the chance that Astral Codex Ten gains paid subscribers this year?”), maybe if the questioner gives it links or feeds it some of the appropriate information. FutureSearch is run by a team formerly from Metaculus, including former Metaculus CTO (and Google internal prediction market veteran) Dan Schwarz. They’re looking for potential clients and/or investors; if you’re interested, email hello@futuresearch.ai. Vitalik On AI Prediction MarketsVitalik Buterin, Ethereum-founder-turned-cryptocurrency-public-intellectual, has a blog post on The Promise And Challenge Of Crypto + AI Applications. One of them is a prediction market. He writes:
I imagine a very successful version of this as a search engine for the future. Like, FutureSearch (above) is pretty cool, but why should you trust them? How do you know they’re better than their AI competitors? Better than humans? Instead, imagine a search engine where you can pay 0.01 ETH to ask a question like “Will Prospera have a population of at least 10,000 people by 1/1/2026?” Your money goes to subsidize a prediction market, and within a few seconds (!) several competing forecasting bots will make their prediction. Then the search engine will return a result for you, and in 2026 some moderator AI will check the news and auto-resolve the question. (this would work even better if you could add some humans in the loop who made money by betting against the forecasting bots on the questions where common sense tells you they’re farthest off - but then you’d have to wait human timescales instead of seconds, and pay human wages instead of peanuts) And as Vitalik mentions, it doesn’t have to be about the future. It can be about any question you’re not sure about, as long as there’s eventually some way to resolve it. Does this beat a centralized AI predictor? That depends whether you trust the centralized AI predictor or not! Remember, the biggest advantage of prediction markets isn’t necessarily accuracy, it’s trust and canonicity. Bet On LoveThis was a Manifold promotional event for Valentine’s Day, taking the form of a “prediction market dating show” where six contestants competed to win a date with local celebrity Aella. It was not what I was expecting. I’m going to include a video embed in a moment, but fair warning: if you already hate any of rationalists, San Francisco, tech, prediction markets, polyamory, betting, love, reality shows, or Aella, this will definitely make you hate them more. I’m including it in this post because . . . well, you know how sometimes Christian apologists say that the Gospel writers must have been telling the truth, because they included parts of the story that were embarrassing to them and to their fledgling movement? I’m including this video so future historians will know I must be telling the truth about everything: Still, the production is incredible and everyone involved is really talented. I was especially impressed by the fake CEO Duncan Horst (whose name is a pseudo-Spoonerism for “drunken host”). I have to admit, as some kind of journalist-adjacent person who writes a monthly column on prediction markets, plus an early funder, I feel like I should be one of the best-qualified people in the world to understand what Manifold is. I’m not sure I do. They’re a prediction market company . . . that also runs a dating site . . . and has a philanthropy arm that moves millions of dollars to charitable causes . . . and puts on some kind of extremely well-done genre-bending reality show slash off-Broadway musical? My current understanding of what happened: there’s a group called the “postrationalists” who were vaguely inspired by rationalist writings but also think the emphasis on facts is boring and autistic and we need to focus more on creativity/friendship/woo/intuition/vibes. They have a gathering called VibeCamp where they do artsy stuff. Someone from Manifold went to a theater production by a postrat group called “The Classics Department”, liked them, and hired them to make a promotional event for Manifold. They are sort of rationalists and prediction market junkies, but also sort of making fun of rationalists and prediction market junkies. Hence this show. This Month In The MarketsThe big AI excitement this month was around OpenAI’s Sora text-to-video engine. Here’s what forecasters thought would be true of it by end 2025: You can see many more (including “It was jailbroken and used to make porn: 54%”) at the link. And one year ago, I asked Manifold whether we would have a text-to-video AI capable of creating a pretty good full-length motion picture by 2028. Here’s how the release of Sora affected the market: …it jumped from about 30% to about 40%. (I don’t know why it went down so much from late 2023 to early 2024.) Metaculus asks the same question and forecasts that AI will be able to make feature films by 2030: The dumbest possible way to do this is to ask GPT-4 to write a summary (“write the summary of a plot for a detective mystery story”), then ask it to convert the summary into a 100-point outline, then convert that into 100 minutes of a 100-minute movie, then ask Sora to generate each one-minute block. This wouldn’t work as written now (I don’t think Sora can do sound, it wouldn’t keep actors and style consistent unless you forced it), but it seems like something that requires incremental improvement rather than a grand breakthrough. Some politics topics courtesy of Polymarket. Look at those amounts of money! See here for more information. This one will only be interesting for other people interested in California’s water situation. Snowpack is a major determinant of how much water eventually makes it into reservoirs for the year and whether we get nagged about drought or not. Here’s median and past years for context, looks like this year is a bit low. Short Links1: Vox: How A Ragtag Group Of Friends Became The Best At Forecasting World Events. Profile of Samotsvety, the world’s leading forecasting team. 2: TIME: Looking For Love? Let The Market Decide. Profile of Manifold.love 3: Speaking of which, Manifold.love announces their new structure; normal use is free, but for $100 they’ll get you a “guaranteed three dates” by subsidizing a prediction market on who you should go on dates with (if you don’t go on dates with anyone, they’ll refund you the $100). 4: A DAO (a type of crypto organization) claims to be implementing futarchy. It’s less clear what they’re implementing futarchy to do, but I think it’s some kind of trading in Solana (a cryptocurrency). 5: Swift Centre has a forecasting piece on the US elections; spiciest claim is that “replacing Biden won’t help the Democrats”. I notice that this isn’t exactly what the question says - it says the Democrats’ chances are better if Biden is the candidate on Election Day than if he isn’t. But this could be because the Democrats are more likely to replace Biden in a world where things aren’t going their way. I don’t know if this is because they simplified the question in their article, or if this is actually a (slight) mistake. 6: Base Rate Times has a list of the Twitter accounts of ~40 superforecasters and other great predictors. 5: The INFER forecasting platform is now being led by RAND, suggesting that important people affiliated with the government are starting to take more interest. 1 From FutureSearch:
You're currently a free subscriber to Astral Codex Ten. For the full experience, upgrade your subscription. |
Older messages
Open Thread 316
Monday, February 19, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Love And Liberty
Monday, February 19, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
X Fact Check: Does Gender Integration Moderate Politics?
Monday, February 19, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Sam Altman Wants $7 Trillion
Tuesday, February 13, 2024
Machine Alignment Monday 2/12/24
Open Thread 315
Monday, February 12, 2024
...
You Might Also Like
They flourish in darkness.
Sunday, December 29, 2024
Shine light in the darkest corners of our politics.
Leisure Again
Sunday, December 29, 2024
Research As Leisure Activity // I Will Piledrive You If You Mention AI Again Leisure Again By Caroline Crampton & Kaamya Sharma • 29 Dec 2024 View in browser View in browser 🦒 This week, The
GeekWire's Most-Read Stories of the Week
Sunday, December 29, 2024
Catch up on the top tech stories from this past week. Here are the headlines that people have been reading on GeekWire. ADVERTISEMENT GeekWire SPONSOR MESSAGE: GeekWire's special series marks
If we fall short …
Sunday, December 29, 2024
This is an unusual message, but the circumstances facing independent journalism have become so dire that I'm hoping you'll take the time to read the whole thing. This is an unusual message, but
In 2024, Our Work Had Impact
Sunday, December 29, 2024
Accountability journalism delivers results. Subjects from The Lever's impact stories in 2024. (AP Photo/Richard Drew/Charlie Riedel) Our efforts aren't simply for show. This year, again and
Just get a comfy bra already
Sunday, December 29, 2024
And more ways to be kind to yourself in the new year View in browser Ad The Recommendation December 29, 2024 Ad Be kind to yourself. Get a comfy bra. An illustration of two Polaroid photos in front of
TikTok Controversy, Mega Millions, and 16th Century Cannons in Arizona
Sunday, December 29, 2024
Supreme Court Justice Samuel Alito told lawmakers Wednesday that he wouldn't recuse himself from cases involving the 2020 election after it was reported that an upside-down flag was flown at his
☕ Turnaround Brew
Sunday, December 29, 2024
These big companies need a little help... Morning Brew Presented By Mode Mobile December 29, 2024 | View Online | Sign Up | Shop Giuseppe Cacace/Getty Images EDITOR'S NOTE Good morning. To anyone
Be a hitmaker.
Saturday, December 28, 2024
Here's how you can help: If you're able, use this 40% off forever discount to become a Lever paid subscriber right now.
How Elon Musk and Mark Zuckerberg are threatening The Intercept
Saturday, December 28, 2024
Social media platforms like Facebook have become increasingly aggressive about suppressing links to legitimate news sites. For years, there have been two key ways that new readers discover independent