I Won My Three Year AI Progress Bet In Three Months
I.DALL-E2 is bad at “compositionality”, ie combining different pieces accurately. For example, here’s its response to “a red sphere on a blue cube, with a yellow pyramid on the right, all on top of a green table”. Most of the elements - cubes, spheres, redness, yellowness, etc - are there. It even does better than chance at getting the sphere on top of the cube. But it’s not able to track how all of the words relate to each other and where everything should be. I ran into this problem in my stained glass window post. When I asked it for a stained glass window of a woman in a library with a raven on her shoulder with a key in its mouth, it gave me everything from “a library with a stained glass window in it” to “a half-human, half-raven abomination”. At the time, I wrote:
This proved controversial. Gary Marcus in particular has emphasized how challenging compositionality is for modern language and image models: Compositionality *is* the wall.
Even “red cube” and “blue cube” on their own are represented unreliably; not one of ten images correctly captures the full phrasal description.
The images are beautiful, but no match for the precision of language. David Madras @david_madras The ways in which #dalle is so incredible (and it is) really put a fine point on the ways in which compositionality is so hard https://t.co/I6DC4g53MKDear @sama @gdb @Plinz @ylecun,
Each of you ridiculed my recent title, but this is what the article was actually about: compositionality.
Yes, there are many kinds of progress in other directions.
But compositionality is at the core of intelligence.
No AGI without it. Gary Marcus @GaryMarcus Compositionality *is* the wall. Even “red cube” and “blue cube” on their own are represented unreliably; not one of ten images correctly captures the full phrasal description. The images are beautiful, but no match for the precision of language. https://t.co/uvoXUtETwiAnd one of my commenters, Vitor, asked:
I responded to Marcus here, and I responded to Vitor by making a bet on whether AI image models could draw some compositionality-heavy pictures by 2025. The specific terms we agreed on:
DALL-E can’t do any of these: If I were being kind, I would give it the farmer in the cathedral. But I am being unkind, so the farmer in front of the cathedral doesn’t count. II.There are now at least four more AI image models available:
Thanks to some help from researchers, employees, and beta testers, I was able to run my prompts through some newer models (thanks especially to Google for eventually giving permission to do this despite their usually high security around these things). The results were:
Imagen got 3/5 and so I would say it wins the bet. There was one snafu, which was that for trust-and-safety reasons, Imagen will not represent the human form (maybe it’s a good Muslim?) We got around this by replacing all humans in the prompts with robots. It still registered surprisingly many trust-and-safety violations for these innocuous prompts, but here’s what we got (slightly edited to always include the best picture of 10): I think it got the cat, the llama, and the basketball, as long as you agree that the last image is sort of an attempt at a robot farmer (he’s wearing a little hat). I think the not-in-the-original-bet demand for it to be a robot complicated the farmer demand and so I’m prepared to give it a break here (that is, if we had only asked for it to be a farmer, it would have done as good a job making farmers as it did making robots). It still fails the library scene, although it does better than DALL-E2 in realizing that the picture itself should be in the style of stained glass. It still fails the fox scene, although it does better than DALL-E2 in at least realizing that the fox should have the lipstick. Without wanting to claim that Imagen has fully mastered compositionality, I think it represents a significant enough improvement to win the bet, and to provide some evidence that simple scaling and normal progress are enough for compositionality gains. Given these gains, it would surprise me (though by no means be impossible) if image model skill plateaued at this level rather than continuing to improve. The original bet from June of this year was about whether AIs would be able to do this by 2025, ie three years from now. In fact, not only did they reach this level in three months, but probably they were at this level before the bet was even made - Google announced Imagen in May 2022; it just took me three months to convince someone there to run my prompts. I think this matches the general finding that AI progress is faster than expected, and increases my certainty that scale and normal progress can sometimes be enough to solve even very difficult problems. You’re a free subscriber to Astral Codex Ten. For the full experience, become a paid subscriber. |
Older messages
Open Thread 241
Monday, September 12, 2022
...
Classifieds Thread 9/22
Thursday, September 8, 2022
...
Links For September 2022
Tuesday, September 6, 2022
...
Open Thread 240
Monday, September 5, 2022
...
Book Review Contest 2022 Winners
Friday, September 2, 2022
...
You Might Also Like
On My Mind: Fig Ornaments and Striped Bath Mats
Thursday, November 28, 2024
Plus: Eensy-weensy, teeny-tiny gifts. The Strategist Every product is independently selected by editors. If you buy something through our links, New York may earn an affiliate commission. November 27,
What It’s Like to Be on Trump’s Enemies List
Wednesday, November 27, 2024
Columns and commentary on news, politics, business, and technology from the Intelligencer team. Intelligencer power What It's Like to Be on Trump's Enemies List “Revenge does take time.” Photo-
GeekWire Mid-Week Update
Wednesday, November 27, 2024
Read the top tech stories so far this week from GeekWire Top stories so far this week Microsoft credited with spotting sophisticated Chinese hack that hit telecoms including T-Mobile US officials say a
Thursday Briefing: A fragile cease-fire in Lebanon
Wednesday, November 27, 2024
Plus, a post-election Thanksgiving. View in browser|nytimes.com Ad Morning Briefing: Asia Pacific Edition November 28, 2024 Author Headshot By Gaya Gupta Good morning. We're covering the first day
Turn your ideas into reality at AWS re:Invent 2024
Wednesday, November 27, 2024
Join in person or the free livestream and learn all things AWS and generative AI GeekWire is pleased to present this special sponsored message to our Pacific NW readers. Don't miss your chance to
SIROTA’S SIGNALS: A New MAGA Plot To Kill Anti-Corruption Laws
Wednesday, November 27, 2024
Plus, new data on Liz Cheney's election effect, the connection between real estate and your insurance premium, and a hidden city discovered under the ice. A New MAGA Plot To Kill Anti-Corruption
Erik Prince sued The Intercept
Wednesday, November 27, 2024
There is an increasingly common strategy by billionaires to weaponize libel law against journalism — and in the Donald Trump era, we can expect the legal attacks on the free press to rise. In 2020, The
AI stops raccoons from invading house
Wednesday, November 27, 2024
Should AI be regulated like drugs and airplanes? | 5 reasons to attend the GeekWire Gala ADVERTISEMENT GeekWire SPONSOR MESSAGE: Get your ticket for AWS re:Invent, happening Dec. 2–6 in Las Vegas:
☕ Call me, beep me
Wednesday, November 27, 2024
How brands can make the most of their BFCM and holiday texts. November 27, 2024 Marketing Brew Presented By Frontify It's Wednesday. Walmart is rolling back investments in some DE&I programs,
☕ Best of retail media
Wednesday, November 27, 2024
Some of our favorite retail media reads this year. November 27, 2024 Retail Brew Presented By Passport It's Wednesday, and it's almost time to put the turkey in. If you have any questions, don