Import AI 253: The scaling will continue until performance saturates

If certain types of AI progress are predictable, then should the government anticipate certain soon-to-arrive capabilities and alter the behavior of its own institutions?

View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Google sets a new record on ImageNet - and all it took was 3 billion images:
...The scaling will continue until performance saturates - aka, not for a while, apparently..
Google has scaled up vision transformers to massive amounts of data and parameters and in doing so set a new state-of-the-art on ImageNet. The research matters for a couple of reasons: first, it gives us an idea of the scalability of this approach (seemingly very good), and it also demonstrates a more intriguing fact about large-scale neural networks - they're more efficient learners.

What they did and what they got: Google explored vision transformers - a type of image recognition system that uses transformers rather than traditional convolutional nets - to unprecedented scales, dumping huge amounts of compute in. The result is a large-scale model that gets a score of 90.45 top-1 accuracy on ImageNet, setting a new state-of-the-art. They also show that networks like this can perform well at few-shot learning; a pre-trained large-scale transformer can get 84.86% accuracy on ImageNet with a mere 10 examples per class - that's 1% of the data ImageNet systems are traditionally trained on.

Why this matters: These results are a big deal, not because of the performance record, but because of few-shot learning - the result highlight how once you scale up a network enough, it seems to be able to rapidly glom onto patterns in the data you feed it, displaying intriguing few-shot learning properties.
Read more: Scaling Vision Transformers (arXiv).

###################################################

Eleuther releases a 6B parameter GPT3-style model - and an API:
...Multi-polar AI world++...
Researchers affiliated with Eleuther, an ad-hoc collection of cypherpunk-esque researchers, have built a 6Billion parameter GPT3-style model, published it as open source, and released a publicly accessible API to give people access to the model through a web interface. That's… a lot! And it's emblematic of the multi-polar AI world we're heading into - one where a proliferating set of actors will adopt different strategies in developing, deploying, and diffusing AI technology. The model is called GPT-J-6B.

What they did that's interesting: Besides the actions, they've done a few interesting technical things here - they've written it in JAX and deployed it on Google's custom TPU chips. The model was trained on 400B tokens from 'The Pile' 800GB dataset. In tests, Eleuther finds that GPT-J-6B performance is roughly on par with OpenAI's 'GPT3-Curie' model, and outperforms other GPT3 variants.

A word about Google: I imagine I'll get flack for this, but it remains quite mysterious to me that Google is providing (some of) the compute for these model replications while itself not really acknowledging that it's doing it. Does this mean Google's official policy on language models is it wants them to proliferate on the open internet? It'd be nice to know Google's thinking here - by comparison, Eleuther has actually published a reasonably lengthy blog post giving their reasoning for why they're doing this - and while I may not agree with all the arguments, it feels good that these arguments are legible. I wonder who at Google is giving the compute to this project and what they think? I hope they write about it.
Check out the Eleuther API to the 6B right here (Eleuther AI).
Read more: GPT-J-6B: 6B JAX-Based Transformer (Aran Komatsuzaki, blog).
Get the model from the GitHub repo here.
Read Eleuther's post on "Why Release a Large Language Model".

###################################################

Self-driving car expert launches startup with $83.5million funding:
...Raquel Urtasun's next step…
Waabi is a new self-driving car startup that launched last week with a $83.5million Series A funding round. Waabi is notable for its name (which my autocorrect tells me is really Wasabi), and for Urtasun's background - she previously led research for Uber's self-driving car effort, and helped develop the widely-used KITTI vision benchmark suite. Waabi's technology uses "deep learning, probabilistic inference and complex optimization to create software that is end-to-end trainable, interpretable and capable of very complex reasoning", according to the launch press release. Waabi will initially focus on applying its technology to long-haul trucking and logistics.
Read more: Waabi launches to build a pathway to commercially viable, scalable autonomous driving (GlobeNewswire, PR).
Find out more at the company's website.

###################################################

Want to get a look at the future of robotics? Sanctuary.AI has a new prototype machine:
...Ex-Kindred, D-Wave team, are betting on a 'labor-as-a-service' robot workforce…
Sanctuary AI, a Canadian AI startup founded by some former roboticists and quantum scientists, thinks that generally intelligence machines will need to be developed in an embodied environment. Because of this, they're betting big on robotics - going so far as to design their own custom machines, in the hopes of building a "general purpose robot workforce".

Check out these robots: The Sanctuary.AI approach fuses deep learning, robotics, and symbolic reasoning and logic for what they say is "a new approach to artificial general intelligence". What's different about them is they already seem to have some nice, somewhat novel hardware, and have recently published some short videos about the control scheme for their robots, how they think, and how their hands work.

Why this matters: There's a lot of economic value to be had in software, but much of the world's economy runs in the physical world. And as seasoned AI researchers know, the physical world is a cruel environment for the sorts of brittle poor-at-generalization AI systems we have today. Therefore, Sanctuary's idea of co-developing new AI software with underlying hardware represents an interesting bet that they can close this gap - good luck to them.
Find out more on their website: Sanctuary.ai.

###################################################Which datasets are actually useful for testing NLP? And which are useless? Now we have some clues:
...Item Response Theory helps us figure out which AI tests are worth doing, and which are ones we've saturated...
Recently, natural language processing and understanding got much better, thanks to architectural inventions like the Transformer and its application to a few highly successful widely-used models (e.g, BERT, GPT3, ROBERTA, etc). This improvement in performance has been coupled with the emergence of new datasets and tests for sussing out the capabilities of these systems. Now, researchers with Amazon, NYU, and the Allen Institute of AI have analyzed these new datasets to try and work out which of them are useful to assess performance of cutting-edge AI systems.

What datasets matter? After analyzing 29 test sets, they find that ", Quoref, HellaSwag, and MC-TACO are best able to discriminate among current (and likely future) strong models. Meanwhile, SNLI, MNLI, and CommitmentBank seem to be saturated and ineffective for measuring future progress." Along with this, they find that "SQuAD2.0, NewsQA, QuAIL, MC-TACO, and ARC-Challenge have the most difficult examples" for current models. (That said, they caution researchers that "models that perform well on these datasets should not be deployed directly without additional measures to measure and eliminate any harms that stereotypes like these could cause in the target application settings."

How they did it: They used a technique called Item Response Theory, "a statistical framework from psychometrics that is widely used for the evaluation of test items in educational assessment", to help them compare different datasets to one another.

Why this matters: Where are we and where are we going - it's a simple question that in AI research is typically hard to answer. That's because sometimes where we think we are is actually a false location because the AI systems we're using our cheating, and where we think we're heading to is an illusion, because of the aforementioned cheating. On the other hand, if we can zoom out and look holistically at a bunch of different datasets, we have a better chance of ensuring our true location, because it's relatively unlikely all our AI techniques are doing hacky responses to hard questions. Therefore, work like this gives us new ways to orient ourself with regard to future AI progress - that's important, given how rapidly capabilities are being developed and fielded.
Read more: Comparing Test Sets with Item Response Theory (arXiv).

###################################################

Tech Tales:

A 21st Century Quest For A Personal Reliquary
[A declining administrative zone in mid-21st Century America]

"For fucks sake you sold it? We were going to pay a hundred."
"And they paid one fifty."
"And you didn't call us?"
"They said they didn't want a bidding war. One fifty gets my kids a pass to another region. What am I supposed to do?"
"Sure," I press my knuckles into my eyes a bit. "You've gotta know where I can get something else."
"Give me a few days."
"Make it a few hours and we'll pay you triple. That'd get you and your wife out of here as well."
"I'll see what I can do."
And so I walked away from the vintage dealer, past the old CRT and LCD monitors, negotiating my way around stacks of PC towers, ancient GPUs, walls of hard drives, and so on. Breathed the night air a little and smelled burning from the local electricity substation. Some sirens startedu up nearby so I turned my ears to noise-cancelling mode and walked through the city, staring at the lights, and thinking about my problems.

The baron would fire me for this, if he wasn't insane. But he was insane - alzheimers. Which meant I had time. Could be an hour or could be days, depending on how lucid he is, and if anything triggers him. Most of his staff don't fire people on his first request, these days, but you can't be sure.
Got a message on my phone - straight from the baron. "I need my music, John. I need the music from our wedding."
I didn't reply. Fifty percent chance he'd forget soon. And if he was conscious and I said I didn't have it, there was a fifty percent chance he'd fire me. So I sat and drank a beer at a bar and messaged all the vintage dealers I knew, seeing if anyone could help me out.

An hour later and I got a message from the dealer that they had what I needed. I walked there and enroute I got a call from the Baron, but I ignored it and let it go to voicemail. "Martha, you must come and get me. I have been imprisoned. I do not know where I am. Martha, help me." And then there was the sound of crying, and then some banging, and then weak shouting in the distance of 'no, give it back, I must speak to Martha', and then the phone hung up. In my mind, I saw the nurses pulling the phone away and hanging it up, trying to soothe the Baron, probably some of them getting fired if he turned lucid, probably some of them crying - even tyrants can elicit sympathy, sometimes.

When I got there the dealer handed me a drive. I connected it to my verifier and waitred a few minutes while the tests got ran. When it came back green I paid him the money. He'd already started packing up his office.
"Do you think it'll be better, if you leave?" I said.
"It'll be different that's for sure," he said, "and that'll be better, I think."
I couldn't blame him. The city was filthy and the barons that ran it were losing their minds. Especially mine.

It took me a couple of hours to get to the Baron's chambers - so many layers of security, first at the outskirts of the 'administrative zone', and then more concentric circles of security, with more invasive tests - physical, then cognitive/emotional. Trying to work out if I'd stab someone with a nearby sharp object, after they'd verified no explosives. That's how it is these days - you can work somewhere, but if you leave and go into the city, people worry you come back angry.

I got to the Baron's chambers and he looked straight at me and said "Martha, help me," and began to sob. Then I heard the distinct sound of him urinating and wetting himself. Saw nurses at my peripheral vision fussing around him. I walked over to the interface and put the drive into it, thern pressed play. The room filled with sounds of strings and pianos - an endless river of music, tumbling out of an obscure, dead-format AI model, trained on music files that themselves had been lost in the e-troubles a few years ago. It was music played at his wedding and he had thought it lost and in a moment of lucidity demanded I find it. And I did.

I looked out the windows at the smog and the yellow-tinted clouds and the neon and the smoke rising from people burning old electronics to harvest copper. And behind me the Baron continued to cry. But at one point he said "John, thank you. I can remember it so clearly", and then he went back to calling me Martha. I looked at my hands and thought about how I had used them to bring him something that unlocked his old life. I do not know how long this region has, before the collapse begins. But at least our mad king is happy and perhaps more lucid, for a little while longer.

Things that inspired this story: Alzheimers; memory; memory as a form of transportation, a means to break through our own limitations; dreams of neofeudalism as a consequence of great technical change; the cyberpunk future we may deserve but not the one we were promised.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Import AI

Many GPUs

Oakland, California 94609

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Import AI 253: The scaling will continue until performance saturates

Older messages

Import AI 252: Gait surveillance; a billion Danish words; DeepMind makes phone-using agents

Import AI 251: Korean GPT-3; facial recognition industrialization; faking fingerprints with GANs

Import AI 250: Facebook's TPU; Twitter analyzes its systems for bias; encouraging proof about federated learning

Import AI 249: IBM's massive code dataset; dataset archaeology: BookCorpus; Facebook wants computers to read the world

Import AI 247: China makes its own GPT3; the AI hackers have arrived; four fallacies in AI research.

You Might Also Like

Daily Coding Problem: Problem #1707 [Medium]

Simplification Takes Courage & Perplexity introduces Comet

Mapped | Which Countries Are Perceived as the Most Corrupt? 🌎

The new tablet to beat

Import AI 402: Why NVIDIA beats AMD: vending machines vs superintelligence; harder BIG-Bench

GCP Newsletter #440

Apple Should Swap Out Siri with ChatGPT

⚡ THN Weekly Recap: Alerts on Zero-Day Exploits, AI Breaches, and Crypto Heists

⚙️ AI price war

Post from Syncfusion Blogs on 03/03/2025