Import AI 266: DeepMind looks at toxic language models; how translation systems can pollute the internet; why AI can make local councils better

Given a long enough time period, is it inevitable that a conscious species invents artificial intelligence? Or is high-powered augmentation a plausible evolutionary path?
View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Language models can be toxic - here's how DeepMind is trying to fix them:
...How do we get language models to be appropriate? Here are some ways…
Researchers with DeepMind have acknowledged the toxicity problems of language models and written up some potential interventions to make them better. This is a big issue, since language models are being deployed into the world, and we do not yet know effective techniques for making them appropriate. One of DeepMind's findings is that some of the easier interventions also come with problems: "Combinations of simple methods are very effective in optimizing (automatic) toxicity metrics, but prone to overfilter texts related to marginalized groups", they write. "A reduction in (automatic) toxicity scores comes at a cost."

Ways to make language models more appropriate:
- Training set filtering: Train on different data subsets of the 'C4' commoncrawl dataset, where they filter the dataset via the use of Google's toxicity-detection 'Perspective' API
- Deployment filtering: They also look at filtering the outputs of a trained model via a BERT classifier finetuned on the 'CIVIL-COMMENTS' dataset
- 'Plug-and-play language models': These models can steer "the LM’s hidden representations towards a direction of both low predicted toxicity, and low KL-divergence from the original LM prediction."

One problem with these interventions: The above techniques all work in varying ways, so DeepMind conducts a range of evaluations to see what they do in practice. The good news? They work at reducing toxicity on a bunch of different evaluation criteria. The bad news? A lot of these interventions lead to a huge amount of false positives: "Human annotations indicate that far fewer samples are toxic than the automatic score might suggest, and this effect is stronger as intervention strength increases, or when multiple methods are combined. That is, after the application of strong toxicity reduction measures, the majority of samples predicted as likely toxic are false positives."

Why this matters: Getting LMs to be appropriate is a huge grand challenge for AI researchers - if we can figure out interventions that do this, we'll be able to deploy more AI systems into the world for (hopefully!) beneficial purposes. If we struggle, then these AI systems are going to generate direct harms as well as indirect PR and policy problems in proportion to their level of deployment. This means that working on this problem will have a huge bearing on the future deployment landscape. It's great to see companies such as DeepMind write papers that conduct detailed work in these areas and don't shy away from discussing the problems.
  Read more:Challenges in Detoxifying Language Models (arXiv).

####################################################

Europe wants chip sovereignty as well:
...EuroChiplomacy+++…
The European Commission is putting together legislation to let the bloc of nations increase funding for semiconduictor design and production. This follows a tumultuous year for semiconductors as supply chain hiccups have caused worldwide delays for things varying from servers to cars. "“We need to link together our world class research design and testing capacities. We need to coordinate the European level and the national investment,” said EC chief Ursula von der Leyen, according to Politico EU. “The aim is to jointly create a state of the art ecosystem,” she added.

Why this matters: Chiplomacy: Moves like this are part of a broader pattern of 'Chiplomacy' (writeup: Import AI 181), that has emerged in recent years, as countries wake up to the immensely strategic importance of computation (and access to the means of computational production). Other recent moves on the chiplomacy gameboard including the RISC-V foundation moving from Delaware to Switzerland, the US government putting pressure on the dutch government to stop ASML exporting EUV tech to China, and tariffs applied by the US against Chinese chips. What happens with Taiwan (and by association, TSMC) will have a huge bearing on the future of chiplomacy, so keep your eyes peeled for news there.
  Read more:EU wants ‘Chips Act’ to rival US (Politico EU).

####################################################

A smart government that understands when roads are broken? It's possible!
...RoadAtlas shows what better local governments might look like…
Roads. We all use them. But they also break. Wouldn't it be nice if we could make it cheaper and easier for local councils to be able to analyze local roads and spot problems with them? That's the idea behind 'RoadAtlas', some prototype technology developed by the University of Queensland and Logan City Council in Australia.

What RoadAtlas does: RoadAtlas pairs a nicely designed web interface with computer vision systems for analyzing pictures of roads for a range of problems, ranging from cracked kerbs, to road alignment issues. Along with the interface, they'v e also built a dataset of 10,000 images of roads with a variety of labels, to help train the computer vision systems.

Why this matters: In the future, we can expect local councils to have trucks studded with cameras patrolling cities. These trucks will do a range of things, such as analyzing roads for damage, surveiling local populations (eek!), analyzing traffic patterns, and more. RoadAtlas gives us a sense of what some of these omni-surveillance capabilities look like.
Read more: RoadAtlas: Intelligent Platform for Automated Road Defect Detection and Asset Management (arXiv).

##################################################

xView 3 asks AI people to build algos that can detect illegal fishing:
...The DoD's skunkworks AI unit tries to tackle AI fishing…
Illegal fishing represents losses of something like $10bn to $23.5bn a year, and now the Department of Defense wants to use AI algorithms to tackle the problem. That's the gist of the latest version of 'xView', a satellite image analysis competition run by DIUx, a DoD org dedicated to developing and deploying advanced tech.

What's xView 3: xView3 is a dataset and a competition that uses a bunch of satellite data (including synthetic aperture radar) to create a large, labeled dataset of fishing activity as seen from the air. "For xView3, we created a free and open large-scale dataset for maritime detection, and the computing capability required to generate, evaluate and operationalize computationally intensive AI/ML solutions at global scale," the authors write. "This competition aims to stimulate the development of applied research in detection algorithms and their application to commercial SAR imagery, thereby expanding detection utility to greater spatial resolution and areas of interest."

What else is this good for: It'd be naive to think xView3 isn't intended as a proxy for other tasks involving satellite surveillance. Maritime surveillance is likely an area of particular interest these days, given the growing tensions in the South China Sea, and a general rise in maritime piracy in recent years. So we should expect that the xView competition will help develop anti-illegal fishing tech, as well as being transferred for other more strategic purposes.
Read more:Welcome to xView3! (xView blog).

####################################################

AI is getting real - so the problems we need to work on are changing:
...The One Hundred Year Study on AI releases its second report…
A group of prominent academics have taken a long look at what has been going on with AI over the past five years and written a report. Their findings? That AI is starting to be deployed in the world at a sufficient scale that the nature of the problems researchers are working on will need to change. The report is part of the Stanford one Hundred year Study on AI ("AI100") and is the second report (reports come out every five years).

What they found: The report identifies a few lessons and takeaways for researchers. These include:
- "More public outreach from AI scientists would be beneficial as society grapples with the impacts of these technologies."
- "Appropriately addressing the risks of AI applications will inevitably involve adapting regulatory and policy systems to be more responsive to the rapidly advancing pace of technology development."
- "Studying and assessing the societal impacts of AI, such as concerns about the potential for AI and machine-learning algorithms to shape polarization by influencing content consumption and user interactions, is easiest when academic-industry collaborations facilitate access to data and platforms."
- "One of the most pressing dangers of AI is techno-solutionism, the view that AI can be seen as a panacea when it is merely a tool."

What the authors think: ""It's effectively the IPCC for the AI community," says Toby Walsh, an AI expert at the University of New South Wales and a member of the project's standing committee", writes Axios.
Read the AI100 report here (Stanford website).
  Read more:When AI Breaks Bad (Axios).

####################################################

Training translation systems is very predictable - Google just proved it:
...Here's a scaling law for language translation…
Google Brain researchers have found a so-called 'scaling law' for language translation. This follows researchers in the past deriving scaling laws for things like training language models (e.g, GPT2, GPT3), as well as a broad range of generative models. Scaling laws let us figure out how much compute/data/complexity we need to dump into a model to get a certain result out, so the arrival of another scaling law increases the predictability of training AI systems overall, and also increases the incentives for people to train translation systems.

What they found: The researchers discovered "that the scaling behavior is largely determined by the total capacity of the model, and the capacity allocation between the encoder and the decoder". In other words, if we look at the scaling properties of both language encoders and decoders we can figure out a rough rule for how to scale these systems. They also find that original data is important - that is, if you want to improve translation performance you need to train on a bunch of original data in the languages, rather than data that has been translated into these languages. "This could be an artifact of the lack of diversity in translated text; a simpler target distribution doesn’t require much capacity to model while generating fluent or natural-looking text could benefit much more from scale."

One big problem: Today, we're in the era of text-generating and translation AI systems being deployed. But there's a big potential problem - the outputs of these systems may ultimately damage our ability to train AI systems. This is equivalent to environmental collapse - a load of private actors are taking actions which generate a short-term benefit but in the long-term impoverish and toxify the commons we all use. Uhb oh!. "Our empirical findings also raise concerns regarding the effect of synthetic data on model scaling and evaluation, and how proliferation of machine generated text might hamper the quality of future models trained on web-text."
Read more: Scaling Laws for Neural Machine Translation (arXiv).

####################################################

AI Ethics, with Abhishek Gupta
...Here's a new Import AI experiment, where Abhishek will write some sections about AI ethics, and Jack will edit them. Feedback welcome!...

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

What happens when your emergency healthcare visit is turned down by an algorithm?
… The increasing role of metadata in healthcare maintained by private enterprises will strip humanity from healthcare …
NarxCare, a software system developed by Appriss, has been used to deny someone opioids on the basis it thought they were at risk of addiction - but a report by Wired shows that the reasons it made this decision weren't very reasonable.

A web of databases and opaque scores: NarxCare from Appriss is a system that uses patient data, drug use data, and metadata like the distance a patient traveled to meet a doctor, to determine their risk of drug addiction. But NarxCare also has problems - as an example, Kathryn, a patient, ran afoul of the system and was denied care because NarxCare gave her a high risk-score. The reason? Kathryn had 2 rescue dogs that she regularly obtained opiods for and because the prescriptions were issued in her name, NarxCare assumed she was a major drug user.
NarxCare isn't transparent: Appriss hasn't made the system for calculating the NarxCare score public, nor has it been peer-reviewed. Appriss has also said contradictory things about the algorithm, for instance that things like NarxCare don’t use distance traveled or data outside of the national drug registries when they have blog posts and marketing material that clearly claims so.

The technology preys on a healthcare system under pressure: Tools like NarxCare provide a distilled picture of the patient’s condition summed in a neat score; consequently, NarxCare strips the patient of all their context, which means it makes dumb decisions. Though Appriss says healthcare professionals shouldn’t use the NarxCare score as the sole determinant in their course of action, human fallibility means that they do incorporate it into their decisionmaking process.

Why it matters: Tools like NarxCare turn a relationship between a healthcare provider and the patient from a caring one to an inquisition. Researchers who have studied the tool have found that it recaptures and perpetuates existing biases in society along racial and gender lines. As we increasingly move towards normalizing the use of such tools in healthcare practice, often under the guise of efficiency and democratization of access to healthcare, we need to make a realistic assessment of the costs and benefits, and whether such costs accrue disproportionately to the already marginalized, while the benefits remain elusive. Without FDA approval of such systems, we risk harming those who really need help in the false hope of preventing some addiction and overdose in society writ large.
Read more: A Drug Addiction Risk Algorithm and Its Grim Toll on Chronic Pain Sufferers (Wired).

####################################################

Tech Tales:

Wires and Lives
[The old industrial sites of America, 2040]

I'm not well, they put wires in my heart, said the man in the supermarket.
You still have to pay, sir, said the cashier.
Can't you see I'm dying, said the man. And then he started crying and he stood there holding the shopping basket.
Sir, said the cashier.
The man dropped the basket and walked out.
They put wires in me, he said, can't any of you see. And then he left the supermarket.

It was a Saturday. I watched the back of his head and thought about the robots I dealt with in the week. How sometimes they'd go wrong and I'd lay them down on a diagnostic table and check their connections and sometimes it wasn't a software fix - sometimes a plastic tendon had broken, or a brushless motor had packed it in, or a battery had shorted and swollen. And I'd have to sit there and work with the my hands and sometimes other mechatronics engineers to fix the machines.
    Being robots, they never said thankyou. But sometimes they'd take photos of me when they woke up.

That night, I dreamed I was stretched out on a table, and tall bipedal robots were cutting my chest open. I felt no pain. They lifted up great wires and began to snake them into me, and I could feel them going into my heart. The robots looked at me and said I would be better soon, and then I woke up.

Things that inspired this story: Those weird dreams you get, especially on planes or trains or coaches, when you're going in and out of sleep and unsure what is real and what is false; how human anxieties about themselves show up in anxieties about AI systems; thinking about UFOs and whether they're just AI scouts from other worlds.



Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Twitter
Facebook
Website
Copyright © 2021 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:
Import AI
Many GPUs
Oakland, California 94609

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by Mailchimp

Older messages

Import AI 265: Deceiving AI hackers; how Alibaba makes money with ML; why governments should measure AI progress

Monday, September 13, 2021

In a thousand years, what might be the best records of contemporary AI systems circa 2021? Models stored on memory diamond? AI algorithms encoded in DNA? Stone sculptures based on 3D CLIP outputs? View

Import AI 264: Tracking UAVs; Stanford tries to train big models; deepfakes as the dog ate my homework

Monday, August 30, 2021

A lot of the biggest users of AI are predominantly web entities that deal in the virtual rather than the physical (eg, Google, Facebook). Though there are some applications of modern AI to robots,

Import AI 263: Foundation models; Amazon improves Alexa; My Little Pony GPT.

Monday, August 23, 2021

If you wanted to compute something over the course of one million years, what would you do? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward

Import AI 262: Israeli GPT3; Korean GLUE; the industrialization of computer vision

Monday, August 16, 2021

What happens when there are too few computers for too few buyers? COVID gave us a taste of this with covid-induced + crypto supply shortages meaning NVIDIA GPUS weren't available. Could this become

Import AI 261: DeepMind makes a better Transformer; drones can see in the dark now; and a 6bn finetuned code model.

Monday, August 9, 2021

Will we ever be able to harness the full distributed processing capability of the world's smartphones? And if we were able to, what would we do with this? View this email in your browser Welcome to

You Might Also Like

Christmas On Repeat 🎅

Monday, December 23, 2024

Christmas nostalgia is a hell of a drug. Here's a version for your browser. Hunting for the end of the long tail • December 22, 2024 Hey all, Ernie here with a refresh of a piece from our very

SRE Weekly Issue #456

Monday, December 23, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: On-call during the holidays? Spend more time taking in some R&R and less getting paged. Let alerts make their rounds fairly with our

The Power of an Annual Review & Grammarly acquires Coda

Sunday, December 22, 2024

I am looking for my next role, Zen Browser got a fresh new look, Flipboard introduces Surf, Campsite shuts down, and a lot more in this week's issue of Creativerly. Creativerly The Power of an

Daily Coding Problem: Problem #1645 [Hard]

Sunday, December 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Facebook. Implement regular expression matching with the following special characters: .

PD#606 How concurrecy works: A visual guide

Sunday, December 22, 2024

A programmer had a problem. "I'll solve it with threads!". has Now problems. two he ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌

RD#486 (React) Things I Regret Not Knowing Earlier

Sunday, December 22, 2024

Keep coding, stay curious, and remember—you've got this ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

🎶 GIFs Are Neat, but I Want Clips With Sound — Your Own Linux Desktop in the Cloud

Sunday, December 22, 2024

Also: 9 Games That Were Truly Ahead of Their Time, and More! How-To Geek Logo December 22, 2024 Did You Know Dextrose is another name for glucose, so if you see it listed prominently on the ingredients

o3—the new state-of-the-art reasoning model - Sync #498

Sunday, December 22, 2024

Plus: Nvidia's new tiny AI supercomputer; Veo 2 and Imagen 3; Google and Microsoft release reasoning models; Waymo to begin testing in Tokyo; Apptronik partners with DeepMind; and more! ͏ ͏ ͏ ͏ ͏ ͏

Sunday Digest | Featuring 'The World’s 20 Largest Economies, by GDP (PPP)' 📊

Sunday, December 22, 2024

Every visualization published this week, in one place. Dec 22, 2024 | View Online | Subscribe | VC+ | Download Our App Hello, welcome to your Sunday Digest. This week, we visualized public debt by

Android Weekly #654 🤖

Sunday, December 22, 2024

View in web browser 654 December 22nd, 2024 Articles & Tutorials Sponsored Solving ANRs with OpenTelemetry While OpenTelemetry is the new observability standard, it lacks official support for many