Import AI 296: $100k to find flaws in LLMs, NVIDIA uses RL to make better chip parts; + 256gb of law data, and a story about the cyber gerontocracy!

Will we ever have 'old growth' computers in the way we have 'old growth' forests today. In the same way there are mainframes that have been running (albeit with parts swapped out and software migrated) over 50+ years, will we have far more aged computers in the future - planetary systems that may stretch to hundreds of years old, tended to by people who, much like today's COBOL whisperers, are acolytes of a dying yet still vitally (and uninvented yet) important language?

View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

From the no good, very bad idea department: Dead Supreme Court Justice bot:
…Where AI PR goes wrong…
Here's a demo from AI21 Labs where they take one of their language models, give it loads of data relating to deceased Supreme Court Justice Ruth Bader Ginsburg, and create a bot that you can talk to and get a 'yes/no' answer about any question.
The "What would RBG (probably) say?" site is a nice example of where AI PR goes wrong - you're taking an exciting technology (AI21 is one of the few credible developers of large-scale language models) to create a demo site where people can… what? Get fuzzy predictions from a system presented as an Oracle which is in fact a weird stochastic blob of neural computation fed on some strings of text.

Charitably, the creators of this might view it as a way to make the technology and its implications more accessible, but I worry this kind of demo just prays upon credulity and also disrespects the recently dead in the process.

What the model thinks about this: Anyway, that's what I think. I figured I'd ask the dead-oracle what it thought. Here's what I asked: "Should AI companies resurrect the dead in service of weird marketing schemes?". Here was the answer: "NO. [Laughs] Absolutely not. Just think about what you're suggesting. It's a wonderful idea, but think about the ethics of it."
Find out more: ask-rbg.ai

####################################################

NVIDIA uses reinforcement learning to make its chips better:
…Enter the era of the recursively self-improving chip company…
NVIDIA has used reinforcement learning to help it design more efficient arithmetic circuits for its latest 'H100' class of GPUs. "The best PrefixRL adder achieved a 25% lower area than the EDA tool adder at the same delay," NVIDIA writes in a blog describing the research. "To the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits."

Why this matters - recursively improving stacks: Sometimes people like to talk about recursively self-improving AI. That's a fun, freaky, and likely quite distant concept. But do you know what is here now? AI that helps recursively improve the companies that develop AI. If we zoom out, it's quite wild that a chip+AI company is now using AI to increase the efficiency of its chips which will in turn increase the efficiency of the AI systems being developed on those same chips. The world turns faster and faster.

Read more: Designing Arithmetic Circuits with Deep Reinforcement Learning (NVIDIA blog).

####################################################

Facebook builds a vast machine translation model and releases it as open source:

…Who builds the lenses that translate across cultures, and what does it mean to be a lens builder?...

Facebook has announced a project called 'No Language Left Behind' (NLLB), which consists of a family of models that can translate between 200 distinct languages, as well as an evaluation dataset for testing out the performance of each language translation. Facebook is using NLLB within its own websites to aid with translation on Facebook and Instagram, and the company has released a bunch of NLLB models for free.

What's special about NLLB: There's a ton of ML translation models floating around the internet. One of the main differences here is how NLLB increases the amount of support for low-resource languages like Kamba, Lao, and a bunch of African languages. "In total, NLLB-200’s BLEU scores improve on the previous state of the art by an average of 44 percent across all 10k directions of the FLORES-101 benchmark. For some African and Indian languages, the increase is greater than 70 percent over recent translation systems," Facebook writes.

Why this matters: Models like NLLB are going to serve as a real world 'babelfish' to translate between different cultures. But the fact these models get trained once and deployed at vast scales means they'll likely have a significant downstream impact on culture - similar to how the early Encyclopedias described (and circumscribed) what many considered public knowledge. Facebook does acknowledge some of this by studying the potential harms and biases of the models, but I generally think the world isn't aware of how dependent foundational capabilities like translation are becoming on just a tiny number of (well intentioned) actors.

Read the research paper: No Language Left Behind: Scaling Human-Centered Machine Translation (Facebook Research).
Get the models: Facebook FairSeq (GitHub).

####################################################

Pile of Law: 256GB of legal data:
…Legal language models are about to get a whole bunch better, plus - lessons for data stewardship…
Stanford researchers have built the 'Pile of Law', a ~256GB dataset of text data relating to legal and administrative topics. The dataset will serve as a useful input for pre-training models, and it also serves as a case study for some of the complicated questions data creators face - namely, how to filter data.

What the Pile of Law is: The dataset consists of "data from 35 data sources, including legal analyses, court opinions and filings, government agency publications, contracts, statutes, regulations, casebooks, and more".

What making the Pile of Law taught them: Because the dataset is based on tons of legal texts, it comes with some in-built filtering. Most jurisdictions they take data from protect the identities of minors, and "no jurisdiction normally permits the publication of financial account numbers,

dates of birth, or identity numbers like social security numbers," they also note.
This means, somewhat similar to how California Protected Categories have become a quasi standard for assessing some of the traits of language models, U.S. court rules may serve as a "floor" for filtering datasets. "Such privacy filtering rules would already go beyond much of current modeling practice," they note.

Get the dataset and check out the Model Card here (HuggingFace).

####################################################

Find ways in which language models ANTI-SCALE and get $100k!

…New prize tries to find things that are the opposite of progress…

A bunch of NYU-linked researchers have created the 'Inverse Scaling Prize', a competition to find tasks where performance decreases as you scale up the size of the underlying model. This is a clever idea - AI, as Import AI readers now, has recently seen such rapid and sustained increases in capabilities that measuring progress has become challenging as benchmarks get saturated (see figure 1 from this 'Dynabench' paper). But despite all that progress, we know that AI models exhibit negative traits, some of which also scale with size (e.g, potential for toxic outputs in LMs). The Inverse Scaling Prize has a chance of generating better information about traits that display an anti-scale property.

"We hope that task submissions will teach us more about what types of tasks exhibit inverse scaling; inverse scaling tasks will also highlight potential issues with the current paradigm of language model pretraining and scaling. Inverse scaling tasks are important because they represent a mismatch between the behavior we want language models to exhibit and the behavior we get in practice from the training objectives and data we use," the authors write.

Prize details: The competition has a $250,000 prize purse, with $100,000 going to a grand prize, up to 5 second prizes each of $20,000 apiece, and up to 10 third prizes of $5,000 each.

Find out more and enter here: Inverse Scaling Prize (GitHub).

####################################################

Hark, a new org for investigating AI progress launches!
…Epoch has an experienced team and an interesting research agenda…
There's a new AI progress org in town: Epoch. Unlike the recent flurry of new AI startups focused on developing capabilities or aiding in alignment research, Epoch is more meta - goal of the org is to analyze trends in machine learning, and to also develop quantitative forecasting models related to advanced AI capabilities. In other words, Epoch might be one of the orgs that ends up pulling the metaphorical 'fire alarm' about imminent, rapid progress in advanced AI - and given the stakes, it's good to have more people in position to pull this alarm.
"We expect to be hiring for several full-time research and management roles this summer. Salaries range from $60,000 for entry roles to $80,000 for senior roles," the organization writes.
Find out more at the official site: Epoch.

####################################################

The Family Trade

[Dyson sphere, within 200 light years of Earth solar system, 40,000 AD]

My partner and I are about to create our offspring, so we need to work out when we want to die. In our society, death is a condition of life. Since we're made out of software, we can theoretically live forever, and our study of human history has shown that societies ruled by the increasingly old are societies that go into terminal decline, as all resources get diverted to serve the people living at the upper bound of the edge distribution.

Despite our dyson spheres, our efficient spacecraft, our trillions of souls housed in facilities embedded deep in moons with stable orbits, we still have finite resources. Infinity tends to do that - you may think you have a lot of something, but if you put it up against infinity, it becomes nothing very quickly.

So that's why parents have to die. Not immediately, obviously - part of the value in having offspring is to introduce heterogeneity into our own species, and to learn about how to be good (and bad) parents and share what we know with the rest of our species. But die we must - so we select a date. That date can be anywhere from ten human years to a thousand human years after the birth of the last offspring (we can choose to have multiple ones, but must plan ahead of time).

We consider this a mark of honor in our society, though, writing this as we are choosing the date of our death, my partner and I must confess we do feel _something_. But we must do this, as our parents did for us.

There are fewer and fewer of us - both children, and those willing to give their lives to be their parents, as time goes on. Immortality is addictive.

Things that inspired this story: The experience of living in a society serving a failing gerontocracy; evolutionary pressure and the need for it; ideas for how the notion of sacrifice may continue to live even if we take the cost of resources to (close to) zero.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Copyright © 2022 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:

Import AI

Many GPUs

Oakland, California 94609

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Import AI 295: DeepMind's baby general agent; NVIDIA simulates a robot factory; AI wars.

Friday, May 20, 2022

If it is possible to develop human-level AI, at one point will we make the first AI magician that perplexes even the most accomplished human magician? View this email in your browser Welcome to Import

Import AI 294: China makes a vast facial recognition dataset; Facebook releases a 30bn parameter model; real world RL

Tuesday, May 10, 2022

If an AI designed the world we lived within, would we know? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI

Import AI 293: Generative humans; few shot learning comes for vision-text models; and another new AI startup is born

Monday, May 2, 2022

How will fashion be influenced by the aesthetics of early AI generation technology? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this email

Import AI 292: AI makes low-carbon concrete; weaponized NLP; and a neuro-symbolic language model

Monday, April 25, 2022

Will sentient machines treat old software like human archaeologists treat ancient ruins? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this

Import AI 291: Google trains the world's biggest language model; how robots can be smarter about the world; Conjecture, a new AI alignment company

Monday, April 11, 2022

Humans tell lots of stories about our ancestors - the things we came from, but which are not strictly human. How might machines talk about their own ancestors? View this email in your browser Welcome

Import AI 296: $100k to find flaws in LLMs, NVIDIA uses RL to make better chip parts; + 256gb of law data, and a story about the cyber gerontocracy!

Older messages

Import AI 295: DeepMind's baby general agent; NVIDIA simulates a robot factory; AI wars.

Import AI 294: China makes a vast facial recognition dataset; Facebook releases a 30bn parameter model; real world RL

Import AI 293: Generative humans; few shot learning comes for vision-text models; and another new AI startup is born

Import AI 292: AI makes low-carbon concrete; weaponized NLP; and a neuro-symbolic language model

Import AI 291: Google trains the world's biggest language model; how robots can be smarter about the world; Conjecture, a new AI alignment company

You Might Also Like

📧 Introduction to Dapr for .NET Developers

This Week in Rust #588

WebAIM February 2025 Newsletter

JSK Daily for Feb 28, 2025

Daily Coding Problem: Problem #1704 [Medium]

iOS Dev Weekly – Issue 701

Feature | The Best Visualizations from February on Voronoi 🏆

Issue #582: Phaser Launcher, DOOM in TypeScript types, and A Prison for Dreams

Stop Android photo surveillance 🔍

Why Natural Language Coding Isn’t for Everyone—Yet