I Won My Three Year AI Progress Bet In Three Months
I.DALL-E2 is bad at “compositionality”, ie combining different pieces accurately. For example, here’s its response to “a red sphere on a blue cube, with a yellow pyramid on the right, all on top of a green table”. Most of the elements - cubes, spheres, redness, yellowness, etc - are there. It even does better than chance at getting the sphere on top of the cube. But it’s not able to track how all of the words relate to each other and where everything should be. I ran into this problem in my stained glass window post. When I asked it for a stained glass window of a woman in a library with a raven on her shoulder with a key in its mouth, it gave me everything from “a library with a stained glass window in it” to “a half-human, half-raven abomination”. At the time, I wrote:
This proved controversial. Gary Marcus in particular has emphasized how challenging compositionality is for modern language and image models: Compositionality *is* the wall.
Even “red cube” and “blue cube” on their own are represented unreliably; not one of ten images correctly captures the full phrasal description.
The images are beautiful, but no match for the precision of language. David Madras @david_madras The ways in which #dalle is so incredible (and it is) really put a fine point on the ways in which compositionality is so hard https://t.co/I6DC4g53MKDear @sama @gdb @Plinz @ylecun,
Each of you ridiculed my recent title, but this is what the article was actually about: compositionality.
Yes, there are many kinds of progress in other directions.
But compositionality is at the core of intelligence.
No AGI without it. Gary Marcus @GaryMarcus Compositionality *is* the wall. Even “red cube” and “blue cube” on their own are represented unreliably; not one of ten images correctly captures the full phrasal description. The images are beautiful, but no match for the precision of language. https://t.co/uvoXUtETwiAnd one of my commenters, Vitor, asked:
I responded to Marcus here, and I responded to Vitor by making a bet on whether AI image models could draw some compositionality-heavy pictures by 2025. The specific terms we agreed on:
DALL-E can’t do any of these: If I were being kind, I would give it the farmer in the cathedral. But I am being unkind, so the farmer in front of the cathedral doesn’t count. II.There are now at least four more AI image models available:
Thanks to some help from researchers, employees, and beta testers, I was able to run my prompts through some newer models (thanks especially to Google for eventually giving permission to do this despite their usually high security around these things). The results were:
Imagen got 3/5 and so I would say it wins the bet. There was one snafu, which was that for trust-and-safety reasons, Imagen will not represent the human form (maybe it’s a good Muslim?) We got around this by replacing all humans in the prompts with robots. It still registered surprisingly many trust-and-safety violations for these innocuous prompts, but here’s what we got (slightly edited to always include the best picture of 10): I think it got the cat, the llama, and the basketball, as long as you agree that the last image is sort of an attempt at a robot farmer (he’s wearing a little hat). I think the not-in-the-original-bet demand for it to be a robot complicated the farmer demand and so I’m prepared to give it a break here (that is, if we had only asked for it to be a farmer, it would have done as good a job making farmers as it did making robots). It still fails the library scene, although it does better than DALL-E2 in realizing that the picture itself should be in the style of stained glass. It still fails the fox scene, although it does better than DALL-E2 in at least realizing that the fox should have the lipstick. Without wanting to claim that Imagen has fully mastered compositionality, I think it represents a significant enough improvement to win the bet, and to provide some evidence that simple scaling and normal progress are enough for compositionality gains. Given these gains, it would surprise me (though by no means be impossible) if image model skill plateaued at this level rather than continuing to improve. The original bet from June of this year was about whether AIs would be able to do this by 2025, ie three years from now. In fact, not only did they reach this level in three months, but probably they were at this level before the bet was even made - Google announced Imagen in May 2022; it just took me three months to convince someone there to run my prompts. I think this matches the general finding that AI progress is faster than expected, and increases my certainty that scale and normal progress can sometimes be enough to solve even very difficult problems. You’re a free subscriber to Astral Codex Ten. For the full experience, become a paid subscriber. |
Older messages
Open Thread 241
Monday, September 12, 2022
...
Classifieds Thread 9/22
Thursday, September 8, 2022
...
Links For September 2022
Tuesday, September 6, 2022
...
Open Thread 240
Monday, September 5, 2022
...
Book Review Contest 2022 Winners
Friday, September 2, 2022
...
You Might Also Like
Friday mailbag edition.
Friday, April 19, 2024
We get into our backlog of reader questions and cover a lot of ground. Friday mailbag edition. By Isaac Saul • 19 Apr 2024 View in browser View in browser Photo by Zeke Tucker / Unsplash Every now and
☕ Computer, take the wheel
Friday, April 19, 2024
Tech Brew takes a ride in a May Mobility AV. April 19, 2024 Tech Brew It's Friday. Tech Brew's Jordyn Grzelewski hopped into an autonomous minivan to get a feel for May Mobility's tech.
LEVER TIME: Democrats Will Not Tolerate Dissent
Friday, April 19, 2024
As part of a revamped and expanded weekly podcast series, David Sirota explores how the DNC crushed 2024 primary challengers — and might have hurt Biden's reelection chances. LEVER TIME: Democrats
Can anyone be neutral on Taylor Swift?
Friday, April 19, 2024
Plus: Dinner party etiquette, the ships saving the internet, and more April 19, 2024 View in browser Happy Friday! Or, rather: Happy Taylor Swift album release day to all who celebrate. I'm
Numlock News: April 19, 2024 • Antarctica, Counterfeits, Eisbock
Friday, April 19, 2024
By Walt Hickey Have a great weekend! Hipgnosis Over the past several years plenty of artists have cashed out by selling their catalogs to the Hipgnosis Songs Fund, including Journey, Justin Bieber,
Iran Explosions, India Elections, and a Humanoid Robot
Friday, April 19, 2024
Facts, without motives.
ACX Survey Results 2024
Friday, April 19, 2024
... ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
☕️ The halvening
Friday, April 19, 2024
Caitlin Clark's eight-figure Nike payday… April 19, 2024 View Online | Sign Up | Shop Morning Brew PRESENTED BY Aura Health Good morning. While you were sleeping… Israel reportedly struck Iran in
EU tells Meta it can't paywall privacy [Fri Apr 19 2024]
Friday, April 19, 2024
Hi The Register Subscriber | Log in The Register {* Daily Headlines *} 19 April 2024 facebook EU tells Meta it can't paywall privacy Platforms should not confront users with 'binary choice'
What A Day: Fury selection
Friday, April 19, 2024
Lawyers in Trump's Manhattan hush money trial have assembled a jury. God help them. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏