Import AI 311: Distributed GPT busts the political economy of AI; Apple optimizes Stable Diffusion; AI war startup raises $1.48 billion

We live in an age of wonders, able to conjur up engines of creation that weave their own idiosyncratic synthesese from the threads of civilization. The West, after its dabble with a secular culture, is moving back to a theistic world. There are new gods and we are making them of silicon, perhaps unwittingly.

View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Test out your coding model on a fuzzed benchmark:
…DS-1000 pits code models against 1,000 tasks spread across seven Python libraries…
Researchers from the University of Hong Kong, Peking University, Stanford University, Berkeley, the University of Washington, Facebook, and Carnegie Mellon University have built DS-1000, a set of 1,000 data science problems spanning seven Python libraries. This is both a dataset and a benchmark and is useful for building code models, like Codegen or Copilot.

What's in DS-1000? The dataset contains 1000 problems drawn from 451 distinct StackOverflow problems. "To defend against potential memorization, more than half of the DS-1000 problems are modified from the original StackOverflow problems; they include 152 surface perturbations, 235 semantic perturbations, and 162 difficult rewrites," the authors write. DS-1000 contains problems in NumPy, SciPy, Pandas, TensorFlow, PyTorch, Scikit-learn, and Matplotlib. "The problems in DS-1000 represent more diverse and naturalistic intent and context formats that cannot be seen in any other datasets," they write.

How hard is it? The best performing models (Codex from OAI) get, at most, about 40% for tasks like insertion, followed by CodeGen(Salesforce) at ~8.4% and InCoder-6B from Facebook (7.5%). This is great news as it suggests it's a hard benchmark.
Read more: DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation (GitHub).
Get the code here: DS-1000 Data Science Code Generation (GitHub).

####################################################

Apple optimizes Stable Diffusion on Apple silicon:
…World's most valuable company + world's most proliferated generative model…
Apple has significantly cut the time it takes to generate images from Stable Diffusion on Apple silicon. It's notable that the world's most valuable company has tacitly adopted the world's most widely distributed (and quite controversial) generative image model, and perhaps a sign of things to come - release the weights of your model, and perhaps vast companies will expend engineering resources to make it run more efficiently on their hardware.

"This release comprises a Python package for converting Stable Diffusion models from PyTorch to Core ML using diffusers and coremltools, as well as a Swift package to deploy the models," Apple writes.

Why this matters - on-device AI: Most AI models need to be sampled from via large computers, typically servers running top-of-the-line GPUs. Large language models, for instance, can take tens of GPUs to sample from in a reasonable time. Image models, while cheaper to sample from, can still be expensive. With this release, Apple has made it significantly faster for people to pull Stable Diffusion images off of their local devices - in other words, you could be sitting in the back of a cab in a place with no cell reception and could idly generate images on a laptop equipped with an M1 or M2 chip.
Read more: Stable Diffusion with Core ML on Apple Silicon (Apple Machine Learning Research blog).
Check out detailed notes here: Core ML Stable Diffusion (Apple GitHub).

####################################################

Want to see if your object detection system works in the real world? Try out Roboflow100:
…RF100 - a reassuringly difficult and diverse benchmark…
Roboflow, a computer vision startup, has released Roboflow-100, a large-scale object detection dataset. What makes Roboflow different is, much like the recent emergence of benchmarks like SuperGLUE (a multi-task NLP benchmark), it takes multiple distinct datasets (in this case: 100) and puts them together into a single suite. This kind of thing tends to be really useful as it helps people work out if their models are overfitting or are actually capable of decent generalization.
Another different thing is the data is sourced from real jobs by real users of Roboflow, so this is less an academic benchmark and more an applied one.

What goes into Roboflow-100? RF100 contains 100 datasets spread across 7 imagery domains, containing a total of 224,714 images annotated with 805 class labels. "By releasing RF100, we aim to provide a semantically diverse, multidomain benchmark of datasets to help researchers test their model’s generalizability with real-life data."
The seven main categories consist of annotation tasks in the following domains: Aerial, Video Games, Microscopic, Underwater, Documents, Electromagnetic, and Real World. All of these main categories contain sub-categories, ranging from first-person shooters (video games) to fishery sights from aquariums (underwater), to geology (real world), etc.

Why this matters - hard enough to be useful: RF100 seems sufficiently large-scale and diverse that it poses a challenge to contemporary systems - that means it can be a valuable tool for developing and assessing the performance of more general models. The roboflow researchers show this by training a couple of baseline models (YOLOv5 and YOLOv7, respectively), as well as training a zero-shot detector called GLIP. The finetuned YOLO variants get about ~65-70% accuracy (v5 and v7, respectively), and GLIP gets ~11%. In other words - RF100 is a challenging benchmark, so there should be some signal in seeing how people do on it.
   Read the paper: Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark (arXiv).
   Read more: roboflow100 (official website).
   Get the dataset: Roboflow 100, GitHub.

####################################################

AI centralization just got less likely: Distributed team train a good 6bn parameter GPT model:
…You've heard about open source models. How about open source models trained over a super shitty network?...
Researchers with Together have trained GPT-JT, a 6bn parameter, well performing model. So far, so normal. The twist is that GPT-JT was trained in a decentralized manner on a heterogeneous bunch of GPUs over slow (1Gbps) internet links. That's a big deal - and has some big implications.

What is GPT-JT and how well does it work?: GPT-JT "is a variant forked off GPT-J and performs exceptionally well on text classification and other tasks," the authors write. "On classification benchmarks such as RAFT, it comes close to state-of-the-art models that are much larger (e.g., InstructGPT davinci v2)". GPT-JT was made possible by a range of open source software, ranging from underlying models (GPT-J, etc), datasets, evaluation metrics, and various contributions to decentralized algorithms.

Trained in a decentralized manner: The authors wrap in a bunch of clever ideas to reduce the burden of decentralized training, cutting the amount of communication needed per machine for all the tokens processed. This is crucial to the success of the project; out-of-the-box decentralized training fails because you have enough between-machine chatter that the slowness of your connections represents a major tax on training.

Centralization versus decentralization - this is an attack on the political economy of AI! A lot of AI development has so far been defined by a small set of groups with access to big, centralized computers. These groups have used these blobs of compute to train impressive models, ranging from AlphaZero to GPT3. It has always been hard for people with fewer computers to catch up to the people with supercomputers. GPT-JT suggests a radically different future - distributed collectives can instead pool computers over crappy internet links and train models together. ex pluribus unum exemplar, if you will.
    Now, the multi-trillion dollar question is if these distributed groups can provably train models on par with those developed by the large, centralized giants. That part is a lot less clear - while GPT-JT is a decent model, it's a tiny one at 6bn parameters. But if they can scale this kind of technique up, the implications are huge.
   There's also the small matter of China, which recently got a lot of its AI ambitions clipped by US export controls preventing it from accessing frontier GPUs. But maybe the frontier doesn't matter as much if you can just aggregate compute across a country of more than a billion of people and train a model with the focus afforded by an Authoritarian regime. Food for thought!
   Read more: Releasing v1 of GPT-JT powered by open-source AI (Together blog).
   Get the code: GPT-JT-6B-v1 (HuggingFace).
   Try out a live demo on HuggingFace here.

####################################################

AI war startup Anduril raises $1.48 billion:
…AI + Robots + Startup DNA = a faster OODA loop for battlefield commanders…
AI War startup Anduril has raised $1.48 billion (that's with a B) in a Series E round. "The new funding will enable Anduril to accelerate research and development to bring new, cutting edge, autonomous defense capabilities to the market and continue to mature and scale its current business lines with the US Department of Defense as well as US allies and partners," the company wrote.

AI and War: Anduril is a fascinating company - it's one of the few modern defense startups in the US that is pairing recent AI innovations with various advances in robotics (e.g, low-cost drones) as well as sensor platforms. Put it all together and you wind up with a company that is fielding an increasingly vast arsenal of devices able to conduct war activities on land, air, and sea (via recent acquisition, Dive Technologies). Some of the company's recent product launches include ALTIUS 600M (a loitering munition, aka a drone that hangs around then kills something with a bang), 'Menace" ("a first-of-its-kind integrated, expeditionary, secure, command, control, communications and computing (C4) platform"), and Mobile Sentry (a robot for autonomous ground and air monitoring).

Why this matters - war is about speed, and AI increases speed: War runs on an OODA loop - Observe, Orient, Decide, Act. By pulling in modern technologies such as AI, Anduril is building an arsenal that increases the speed at which battlefield commanders can iterate through the OODA loop. Anduril is less about its individual items and more about its overall suite of products - taken together, they potentially let an entrepreneurial army out-think the competition via running an OODA loop. War is a depressing thing, but a more depressing thing is losing wars, so the funding for Anduril seems like a positive indication for the US (and allied) defense industrial base. I hope it continues to succeed in breaking through the monopoly of the aging so-called defense 'primes' (Lockheed, etc).
Read more: Anduril Raises $1.48 Billion in Series E Funding (Anduril blog, Medium).

####################################################

Reality Authentication
[The internet, 2034]

"To login, spit into the bio-API"
I took a sip of water and swirled it around my mouth a bit, then hawked some spit into the little cup on my desk, put its lid on, then flipped over the receptacle and plugged it into the bio-API system.
"Authenticating… authentication successful, human-user identified. Enjoy your time on the application!"
I spent a couple of hours logged-on, doing a mixture of work and pleasure. I was part of an all-human gaming league called the No-Centaurs; we came second in a mini tournament. I also talked to my therapist sans his augment, and I sent a few emails over the BioNet protocol.

When I logged out, I went back to the regular internet. Since the AI models had got minituarized and proliferated a decade ago, the internet had radically changed. For one thing, it was so much faster now. It was also dangerous in ways it hadn't been before - Attention Harvesters were everywhere and the only reason I was confident in my browsing was I'd paid for a few protection programs.

Things that inspired this story: The ceaseless march of generative model progress; chatGPT; high- and low-class hobbies; the rich will always have a retreat, while the poor will always be condemned to the most experimental parts of the frontier.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Copyright © 2022 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:

Import AI

Many GPUs

Oakland, California 94609

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Import AI 310: AlphaZero learned Chess like humans learn Chess; capability emergence in language models; demoscene AI.

Monday, November 28, 2022

If I wasn't spending my dayjob working directly on AI and AI policy, I'd be running a scaled-up version of Import AI. Lately, I've been wondering if there's a way to do both - it'd

Import AI 309: Generative bias; BLOOM isn't great; how China and Russia use AI

Monday, November 14, 2022

If we wanted to make the next five years of AI development go well, what would be the three most important things to work on, and what should be deprioritized? View this email in your browser Welcome

Import AI 308: Recursively self-improving LMs (!!!), 3.1TB of code data; DALL-E2 makes alien errors

Monday, October 31, 2022

Honestly, these days I feel pretty confused about AI. AI progress is happening at such an astounding rate I find myself asking 'why isn't basically everyone working on this?'. I don't

Import AI 307: Copilot lawsuit; Stability raises $101m; US v China CHIPLOMACY

Tuesday, October 25, 2022

If all AI research stopped today (but engineering and improvement of existing systems continued), then how would the world look in a decade? View this email in your browser Welcome to Import AI, a

Import AI 306: Language models learn about the world via MuJoCo; Amazon releases a big Q&A dataset; and DeepMind tests out multimodal systems

Monday, October 17, 2022

In the same way dogs and whales are alien intelligences with respect to humans, how 'alien' might AI seem to us? View this email in your browser Welcome to Import AI, a newsletter about

Import AI 311: Distributed GPT busts the political economy of AI; Apple optimizes Stable Diffusion; AI war startup raises $1.48 billion

Older messages

Import AI 310: AlphaZero learned Chess like humans learn Chess; capability emergence in language models; demoscene AI.

Import AI 309: Generative bias; BLOOM isn't great; how China and Russia use AI

Import AI 308: Recursively self-improving LMs (!!!), 3.1TB of code data; DALL-E2 makes alien errors

Import AI 307: Copilot lawsuit; Stability raises $101m; US v China CHIPLOMACY

Import AI 306: Language models learn about the world via MuJoCo; Amazon releases a big Q&A dataset; and DeepMind tests out multimodal systems

You Might Also Like

This Week in Rust #588

WebAIM February 2025 Newsletter

JSK Daily for Feb 28, 2025

Daily Coding Problem: Problem #1704 [Medium]

iOS Dev Weekly – Issue 701

Feature | The Best Visualizations from February on Voronoi 🏆

Issue #582: Phaser Launcher, DOOM in TypeScript types, and A Prison for Dreams

Stop Android photo surveillance 🔍

Why Natural Language Coding Isn’t for Everyone—Yet

iOS Cocoa Treats