Import AI 315: Generative antibody design; RL's ImageNet moment; RL breaks Rocket League

I have vivid dreams these days about regular jobs that don't involve attempts to construct great machine minds. In these dreams I feel a great storm approaching from the horizon, full of microscopic machines and alien intelligences. Great changes are coming and have already begun to happen. Our societies are being driven in part by machine intelligence, sometimes poorly understood by the humans that develop and deploy it. The era of 'Centaur-Humanity' is here and I am surprised there isn't daily news about it. Why are we blind to the digital equivalent of Climate Change? And what might it mean after we turn the crank a few more times on model development and experience the consequences?

View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Facebook and Shutterstock partner to slurp up stock images and train gen models on them:
…The Data Wild West is transitioning into the rest of Capitalism…
Facebook and Shutterstock have extended their partnership, giving the social network a greater ability to use Shutterstock's vast archive of images to train machine learning models. This follows Shutterstock earlier partnering with OpenAI and also LG AI Research.
"By tapping into Shutterstock's collection of millions of images, videos and music, Meta plans to use these datasets to develop, train and evaluate its machine learning capabilities," Shutterstock wrote in a press release announcing the deal. (It also seems like a move to sidestep the sorts of legal issues that Stable Diffusion, Midjourney, and DeviantArt are finding themselves in - see later in this issue).

Why this matters: Given the success of image (and, soon, video) models, it's logical that tech companies want to partner with large sources of data. This deal highlights how strategic data is becoming, and also shows how the AI systems of the future will neatly recapitulate the power structures of the present via following the established 'gradients' of capitalism. So it goes.
Read more: Shutterstock Expands Long-standing Relationship with Meta (CISION).

####################################################

DeepMind makes a general-purpose RL algorithm - it works really well!
…RL might have just had its ImageNet moment…
Researchers with DeepMind and the University of Toronto have built DreamerV3, a "general and scalable [RL] algorithm based on world models that outperforms previous approaches across a wide variety of domains with fixed hyperparameters". In other words, it's one system which you can train on different tasks without too much fiddling - and it works well! This is potentially quite significant; RL agents tend to either generalize widely but perform poorly (or inefficiently), or perform fantastically but generalize poorly. DreamerV3 seems to generalize widely and perform very well.

DreamerV3 also solves a longstanding benchmark (well, four years old, which is ancient in the dog-year pace at which AI development happens) - it's able to learn how to play Minecraft and, in some games, obtain the 'diamond', which involves exploring the game and climbing the tech tree.

What it is: "DreamerV3 learns a world model from experience," the researchers write. Specifically, DreamerV3 "consists of 3 neural networks: the world model predicts future outcomes of potential actions, the critic judges the value of each situation, and the actor learns to reach valuable situations". Basically, the world model learns to represent the environment and make forward predictions, and the actor/critic take actions and figure out if the actions were worthwhile.

Model scaling comes to RL: RL agents are wildly tiny compared to language models, but they are starting to exhibit scaling properties; here, the authors train DreamerV3 in sizes ranging from 8M to 200M parameters and demonstrate a reliable scaling law "where increased model size leads

to monotonic improvements in final performance and data-efficiency." This is pretty meaningful - when stuff starts reliably scaling, you've probably built something simple enough that it won't break under extreme duress.

Counterintuitively small: The agents are also very efficient to train. "All DreamerV3 agents are trained on one Nvidia V100 GPU each," the authors write. Part of why they're so easy to train is, unlike large generative models pre-trained on chunks of the internet, these agents aren't pre-trained so they aren't massive models to begin with.

Benchmark-palooza: DeepMind tests out DreamerV3 on a ton of diverse benchmarks. The results are pretty convincing, indicating that DreamerV3 both generalizes and does so in a high-performance and data-efficient way. Specifically:

Proprio Control Suite; 18 continuous control tasks, ranging from classical control over locomotion to robot manipulation tasks. DreamerV3 sets a new state-of-the-art on this benchmark, outperforming D4PG, DMPO, and MPO
Visual Control Suite; 20 continuous control tasks where the agent receives only high-dimensional images as inputs. DreamerV3 establishes a new state-of-the-art, outperforming DrQ-v2 and CURL
Atari 100k; 26 Atari games. DreamerV3 outperforms most well-ranking systems (IRIS, SPR, SimPLe), though doesn't get as good a score as EfficientZero (which combines online tree search, prioritized replay, hyperparameter scheduling, and allows early resets of games".
Atari 200M; 55 Atari games with a budget of 200M environment steps (compared to hundreds of thousand for the above). "DreamerV3 outperforms DreamerV2 with a median score of 302% compared to 219%, as well as the top model-free algorithms Rainbow and IQN"
BSuite; 23 environments with a total of 468 configurations that are designed to test credit assignment, robustness to reward scale and stochasticity, memory, generalization, and exploration. New state-of-the-art, beating Bootstrap DQN and Muesli.
Crafter, a "procedurally generated survival environment with top-down graphics and discrete actions"; DreamerV3 sets a new state-of-the-art, outperforming PPO with the LSTM-SPCNN architecture, OC-SA, DreamerV2, and Rainbow
DMLab; 3D environments that require spatial and temporal reasoning. DreamerV3 matches and exceeds the performance of DeepMind's IMPALA agent in 50 million environment steps (versus 10 billion environment steps for IMPALA).

The Minecraft result in full: Perhaps most impressively, DreamerV3 is "the first algorithm to collect diamonds in Minecraft from scratch" - a formidable challenge, requiring the agent to learn to explore the game and figure out how to climb its proverbial tech tree. An earlier result from OpenAI, VPT, used a ton of human data to do this - the fact Dreamer does it without any human data is impressive.
"Across 40 seeds trained for 100M environment steps, DreamerV3 collects diamonds in 50 episode. It collects the first diamond after 29M steps and the frequency increases as training progresses. A total of 24 of the 40 seeds collect at least one diamond and the most successful agent collects diamonds in 6 episodes." (One note, though, is that DeepMind increases 'the speed at which [MineCraft] blocks break to allow learning Minecraft with a stochastic policy'.

Why it might and might not matter: DreamerV3 is efficient but it doesn't directly attack the main problem with RL - reality doesn't have a great simulator. Unless we can figure out some RL equivalent of LM pre-training (train an RL agent on enough datasets it can few-shot generalize to reality), then RL agents might always be somewhat limited - on the other hand, there are tons of worthy problems in the world which do come with simulators (e.g, managing power in buildings, stabilizing fusion reactors, etc), so the point could be moot.
Read more: Mastering Diverse Domains through World Models (arXiv).

####################################################

Uh-oh, an RL agent might be ruining the videogame 'Rocket League'
…A somewhat sad microcosm of things to come…
Recently, an AI agent trained via RLGym to play the popular videogame 'Rocket League' has appeared on a bunch of ranked servers and started beating human players. This has caused a small uproar on the typically quite quiet and convivial Rocket League community.

What happened: It's a little tricky to piece together, but basically it seems like someone took a bot called 'Nexto' trained via RLGym, then someone figured out how to port that bot to work with RLBot, which is software that enables custom bots in Rocket League.

Why it matters: AI is going sufficiently mainstream that it's bringing with it all the delightfully crummy parts of human nature, like cheating just for the heck of it (see also, all the TikToks where young kids explain how to use chatGPT to make money by creating random SEO spamsites).
   Read more: RLGym Question Thread about the Nexto Cheating Situation (Reddit).
   Read more: Uh oh, people are now using AI to cheat in Rocket League (PCGamer).
   More about RLBot here.
   More about RLGym here.

####################################################

Copilot class action lawyers prepare lawsuit against StableDiffusion:
…Can you hear that? It's the sound of the legal precedent train approaching the AI train station…
Matthew Butterick, the lawyer and programmer who instigated the class action suit against Microsoft, GitHub, and OpenAI over Github Copilot (Import AI 307), has now filed a class-action complaint against Stability AI, DeviantArt, and Midjourney over the 'Stable Diffusion' AI art model.

What's the lawsuit about?: The gist of the lawsuit is that "Stable Diffusion contains unauthorized copies of millions—and possibly billions—of copyrighted images. These copies were made without the knowledge or consent of the artists", and therefore artists deserve payment for the usage of their images - "Even assuming nominal damages of $1 per image, the value of this misappropriation would be roughly $5 billion," Butterick writes.
I think the core of why this lawsuit is being filed is summed up by this phrase from Butterick et al: StableDiffusion "is a parasite that, if allowed to proliferate, will cause irreparable harm to artists, now and in the future."

Who the lawsuit is targeting and why: The lawsuit is targeting three entities for different reasons:

Stability AI; funded LAION, the german organization behind the underlying dataset for Stable Diffusion, also developed Stable Diffusion, also hosts a paid app for generating stuff from SD called DreamStudio.
DeviantArt; released an app called DreamUp (a paid app build around Stable Diffusion), despite operating a site from which many images were scraped.
Midjourney; runs a paid generative AI app via AI and Discord, and its founder has said Midjourney is trained on "a big scrape of the internet".

Why this matters: AI is, in legal terms, a lawless Wild West. That worked while it was mostly a research endeavor but isn't going to work now we're in the era of industrialized AI and global deployment. Lawsuits like this will set important precedents in the relationship between data inputs and AI models.
Read more: Stable Diffusion Litigation (official website).

####################################################

Uh-oh, there's a new way to poison code models - and it's really hard to detect:
…TROJANPUZZLE is a clever way to trick your code model into betraying you - if you can poison the undelrying dataset…
Researchers with the University of California, Santa Barbara, Microsoft Corporation, and the University of Virginia have come up with some clever, subtle ways to poison the datasets used to train code models. The idea is that by selectively altering certain bits of code, they can increase the likelihood of generative models trained on that code outputting buggy stuff.

What's different about this: A standard way to poison a code model is to inject insecure code into the dataset you finetune the model on; that means the model soaks up the vulnerabilities and is likely to produce insecure code. This technique is called the 'SIMPLE' approach… because it's very simple!

Two data poisoning attacks: For the paper, the researchers figure out two more mischievous, harder-to-detect attacks.

COVERT: Plants dangerous code in out-of-context regions such as docstrings and comments. "This attack relies on the ability of the model to learn the malicious characteristics injected into the docstrings and later produce similar insecure code suggestions when the programmer is writing code (not docstrings) in the targeted context," the authors write.
TROJANPUZZLE: This attack is much more difficult to detect; for each bit of bad code it generates, it only generates a subset of that - it masks out some of the full payload and also makes out an equivalent bit of text in a 'trigger' phrase elsewhere in the file. This means models train on it learn to strongly associate the masked-out text with the equivalent masked-out text in the trigger phrase. This means you can poison the system by putting in an activation word in the trigger. Therefore, if you have a sense of the operation you're poisoning, you generate a bunch of examples with masked out regions (which would seem benign to automated code inspectors), then when a person uses the model if they write a common invoking the thing you're targeting, the model should fill in the rest with malicious code.

Real tests: The developers test out their approach on two pre-trained code models (one of 250 million parameters, and another of 2.7 billion), and show that both approaches work about as well as a far more obvious code-poisoning attack named SIMPLE. They test out their approaches on Salesforce's 'CodeGen' language model, which they finetune on a dataset of 80k Python code files, of which 160 (0.2%) are poisoned. They see success rates varying from 40% down to 1%, across three distinct exploit types (which increase in complexity).
Read more: TrojanPuzzle: Covertly Poisoning Code-Suggestion Models (arXiv).

####################################################

AI can design antibodies now. That's it. That's the headline.
…Absci Corporation makes a real breakthrough in wetlab AI…
AI startup Absci Corporation has used generative deep learning models to de novo design antibodies against three distinct targets in a zero-shot fashion. "All designs are the result of a single round of model generations with no follow-up optimization". The three discovered antibodies display better qualities - in real world tests, no less - than human-designed ones. This is a big deal.

The result in full: "In total, we generate and screen 440,354 antibody variants with the ACE assay to identify binding variants. We find approximately 4,000 estimated binders based on expected ACE assay binding rates (Materials and methods, Table S3) and advance a subset for further characterization," they write. "From these screens, we further characterize 421 binders using surface plasmon resonance (SPR), finding three that bind tighter than the therapeutic antibody trastuzumab".

Is this actually a big deal? Yes… but don't take it from me, take it from researchers with Rensselaer Polytechnic Institute who wrote in a paper in 2015 that "the holy grail of antibody design is to accurately and reliably predict the sequences of antibodies that will bind with high affinity and specificity based solely on the sequence or composition of the antigen" - that's pretty much what this result accomplishes.

Why this matters: This paper is yet more evidence that AI systems are capable of usefully approximating the real world. It follows results in other domains where AI systems have succeeded at predicting short-term weather patterns, stabilizing plasma in prototype fusion reactors, and doing inventory management for real-world warehouses. The takeaway should be that if you train something to fit a complex enough high-dimensional data distribution then, increasingly, it will generalize to the complexity of the real world. This has huge, mind-bending implications for society.

   "Our work represents an important advancement in in silico antibody design with the potential to revolutionize the availability of effective therapeutics for patients," the authors write. "Generative AI-designed antibodies will significantly reduce development timelines by generating molecules with desired qualities without the need for further optimization. Additionally, the controllability of AI-designed antibodies will enable the creation of customized molecules for specific disease targets, leading to safer and more efficacious treatments than would be possible by traditional development approaches."
   Read more: Unlocking de novo antibody design with generative artificial intelligence (bioRxiv).
   Get the sequences of binding antibodies here: Unlocking de novo antibody design with generative artificial intelligence (GitHub).
   Read more: Advances in Antibody Design (National Library of Medicine).
Thanks to Absci Chief AI Officer Joshua Meier for taking time to discuss this result with me.

####################################################

AI War

[Hopefully never, but depends on how badly we screw up the rollout of AI technology…]

The war came at night and was over before morning.

When we woke the currencies had changed and so had our news presenters. A new power was in charge. Our IDs swapped over. The internet sites we used were still there, but the things which were popular were different.

On social media, we could now say some things we couldn't say before. Other things that had been fine to say were now forbidden.

School was the same but history classes had changed - the past was presented differently.

Religion, surprisingly, was not altered at all - the same places of worship and all the same ancients, and the secular decline unchanged.

Things that inspired this story: How rapidly AI wars might happen; culture as a casualty of AI war; the rise and fall of information empires; the English poet Matthew Francis.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Import AI

Many GPUs

Oakland, California 94609

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Import AI 314: Language models + text-to-speech; emergent cooperation in wargames; ICML bans LLM-written papers

Monday, January 9, 2023

It's quite odd to be at the 'factional warfare' stage of AI development, but the ICML story shows how as AI progresses, we can expect the AI research community to be riven by ideological

Import AI 313: Smarter robots via foundation models; Stanford trains a small best-in-class medical LM; Baidu builds a multilingual coding dataset

Tuesday, January 3, 2023

If AI timelines keep shortening among those working directly on AI systems, at what point do you need to 'hard update' policymakers that something strategic and important is closer than it

Import AI 312: Amazon makes money via reinforcement learning; a 3-track Chinese AI competition; and how AI leads to fully personalized media

Monday, December 12, 2022

2022 will probably be the 'eternal september' of AI research; everyone is starting to pay attention View this email in your browser Welcome to Import AI, a newsletter about artificial

Import AI 311: Distributed GPT busts the political economy of AI; Apple optimizes Stable Diffusion; AI war startup raises $1.48 billion

Monday, December 5, 2022

We live in an age of wonders, able to conjur up engines of creation that weave their own idiosyncratic synthesese from the threads of civilization. The West, after its dabble with a secular culture, is

Import AI 315: Generative antibody design; RL's ImageNet moment; RL breaks Rocket League

Older messages

Import AI 314: Language models + text-to-speech; emergent cooperation in wargames; ICML bans LLM-written papers

Import AI 313: Smarter robots via foundation models; Stanford trains a small best-in-class medical LM; Baidu builds a multilingual coding dataset

Import AI 312: Amazon makes money via reinforcement learning; a 3-track Chinese AI competition; and how AI leads to fully personalized media

Import AI 311: Distributed GPT busts the political economy of AI; Apple optimizes Stable Diffusion; AI war startup raises $1.48 billion

Import AI 310: AlphaZero learned Chess like humans learn Chess; capability emergence in language models; demoscene AI.

You Might Also Like

WebAIM February 2025 Newsletter

JSK Daily for Feb 28, 2025

Daily Coding Problem: Problem #1704 [Medium]

iOS Dev Weekly – Issue 701

Feature | The Best Visualizations from February on Voronoi 🏆

Issue #582: Phaser Launcher, DOOM in TypeScript types, and A Prison for Dreams

Stop Android photo surveillance 🔍

Why Natural Language Coding Isn’t for Everyone—Yet

iOS Cocoa Treats

Your new cheap TV streaming option 📺