Import AI 284: 20bn GPT model; diachronic LMs; what people think about AI

 In one thousand years, what percentage of the 'thinking' occurring on Earth will derive from machines instead of biological organisms?
View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.
 

Want a 20B parameter GPT-style language model? Go here!
…Eleuther releases the largest public open source AI model…
Last week, we wrote about how Eleuther was about to release a 20B parameter language model. Now, they have.
  Get the model here (Eleuther, GitHub).
  Read the research paper: GPT-NeoX-20B: An Open-Source Autoregressive Language Model (PDF).
####################################################

Want a language model that actually knows about COVID? You might need a Diachronic model:
…Models trained on newer data do better - try them yourself…
Researchers with the University of Porto, Snap Inc., and Cardiff NLP have built a family of so-called 'time-aware' BERT-style language models, trained on Twitter data. The craziest part of this is that they're committing to "keep updating and releasing a new model every three months, effectively enabling the community to make use of an up-to-date language model at any period in time".

What the problem is: Most language models are trained on a dataset, then never updated. That means that some language models might have no knowledge of minor events like the global COVID pandemic. This is obviously a problem and the solution is simple (albeit labor-intensive) - periodically gather new data and re-train models.

What they did: They train a base RoBERTa model using Twitter data that cuts off in 2019, made up of 90 million tweets. Then, for every three months that elapses after that, they add 4.2 million tweets into the dataset and train a new model. At the time of writing, they've trained nine models in total, with the latest model (2021-Q4) being trained on 123.86 million tweets. The theory is that newer models should perform better on more modern tasks and evaluations.


How well does it do? They compare their models against a few baselines, including BERTweet (which was trained on ~900m tweets). In tests, their models beat BERTweet on six out of seven benchmarks, though BERTweet gets the best overall performance. These aren't strictly 'time-aware' evaluations, though; they just test some classification abilities for things like emotions, irony, stance, and so on. In these time-aware tests, they find that pseudo-perplexity (PPPL) tends to increase by about 10% for each year by which the models are out of date (so the models get 10% less good and appropriate in terms of the text they generate). ". This result reinforces the need for updated language models even for short time periods," the researchers write.

Why this matters: AI models naturally freeze-dry the cultural landscape they're trained on, meaning that if we don't get good at updating our models, we'll end up trapped in a world where many of our AI systems are outputting things relevant to prior eras and cultural trends - this will make them less useful, and holds the potential for creating feedback loops around cultural stagnation. AI models are weird mirrors of society, so we need to remake them as society changes. 

   Read more: TimeLMs: Diachronic Language Models from Twitter (arXiv).
  Get the models here (Cardiff NLP, Twitter).

####################################################

U.S. Army gets smart, semi-autonomous personal drones:
…Skydio gets a $20m a year contract..
Skydio, the company that makes drones which can navigate themselves semi-autonomously, has gained a five-year contract with the U.S. Army, worth up to $99.8m over five years. Skydio was selected as part of the Army's procurement initiative around small, personal drones - the Short Range Reconnaissance (SRR) Program of Record. Skydio was chosen after the Army evaluated 30 small drone vendors. "Skydio drones deliver unparalleled situational awareness and ease of use in the most demanding situations thanks to Skydio Autonomy," said Skydio CEO, Adam Bry, in a press release.

Things that start out as toys become weapons: Skydio started as a drone advertized for sports enthusiasts who wanted a drone that could follow and film them as they ran around, snowboarded, hiked, climbed cliffs, or any other high-octane Type A personality activity. It's funny how after a few years of development, the company is now getting into the military. Many toys for rich people ultimately become weapons (and vice versa).

Why this matters:  For many years, militaries have been centaurs - collectives of humans and machines working together. This has mostly taken the form at high levels of abstractions; satellites provide information to people managing teams, or teams of humans use bomb-disposal robots to deal with IEDs. With things like the Skydio contract, we're entering the era of the personal centaur - small groups of soldiers, or even individuals, will have their own little machine emissaries with which to conduct operations.
  Read more: U.S. Drone Maker Skydio Wins Production Other Transaction (OT) Agreement for U.S. Army Short Range Reconnaissance Program (Skydio).


####################################################

Simulators are the new platforms: Waabi unveils a self-driving car sim:
…Raquel Urtasun's startup wants to build a business on simulators…
Waabi, a self-driving car startup run by the former head of Uber's self-driving research team, Raquel Urtasun, has announced 'Waabi World', a simulator for training self-driving cars.

Distinguishing features: Waabia claims it is "the most scalable, highest fidelity closed-loop simulator ever" (I somehow doubt Tesla or Waymo would agree, but hey, they're not talking about their sims!). The simulator has four main features:
- High fidelity world simulation: Uses AI to reconstruct real-world geometry, appearance, and material properties.
- High-fidelity sensor simulation: Uses AI and physics-based rendering "to simulate realistic sensor data in near real-time".
- Automatic stress-testing: Automatically generates challenging traffic scenarios to test out the simulated cars against.
- Reinforcement learning: Waabi uses RL to update the car agents so they can learn to drive in the simulation. (There's some very fluffy writing here and it doesn't say RL anywhere, but that's what I infer.)

Why this matters: Waabi seems like a decent simulator that is mostly interesting because it's public, versus the private simulators operated by other self-driving car ventures. What'll be fascinating is if Waabi can actually out-compete its rivals who have more vehicles, bigger computers, and better data. Perhaps a good simulator can provide an edge?
  Read more: Welcome to Waabi World (Waabi website).

   Read more: How Waabi World works (Waabi website).

####################################################

How do algorithmic impact audits work in the real world? Here's an NHS example:
…UK's healthcare behemoth gets advice from the Ada Lovelace Institute…
UK thinktank the Ada Lovelace Institute has written a detailed proposal for conducting an algorithmic impact assessment for data access in a healthcare context. Algorithmic impact assessments are a method to assess the potential societal impact of an AI system in advance of its deployment, and to identify ways to continuously monitor the system for these impacts once deployed.

Seven steps for an algorithm impact assessment: The Ada Lovelace Institute identifies seven steps that the UK's National Health Service (NHS) should go through, before it gives people access to the National Medical Imaging Platform (NMIP) - a vast repository of digitized medical data.
  1. What do we want to do: People who want to access the NMIP should outline the prupose, scope, and intended use of the system they'll build.
  2. Filtering: The NMIP should filter these applications according to its own criteria.
  3. Problem brainstorming: Successful applicants should attend a workshop where they try and think through the harm and benefit scenarios that could come out of NMIP access.
  4. Rewrite: People should rewrite 1) to incorporate insights from 3) and re-submit it.
  5. Decision: NMIP decides whether to grant access to the people who want access.
  6. The impact assessments are published on a website.
  7. Revision: The assessments get revised as the underlying algorithms change (e.g, if a model has been significantly iterated upon).

Why this matters: AI is in a 'state of nature' when it comes to AI regulation - there's almost no regulation, the landscape is full of all kinds of weird entities (some of which are predators), and there isn't any real system that governs them. Things like the Ada Lovelace guide for an impact assessment are one way to bring sense to this world.  

   Read more: Algorithmic impact assessment: a case study in healthcare (Ada Lovelace Institute).


####################################################

What do people in 26 countries think about AI?
…Tony Blair Institute survey gives us a sense of the 'vibe' re: AI right now…
The Tony Blair Institute has surveyed people in 26 countries (including: Russia, Great Britain, and Saudi Arabia) and the results are quite counterintuitive.

Results highlights:
- 60% of people surveyed "support the use of AI for selected policing and medical applications", though there's variation across developing and emerging markets; in developed countries, fewer people want AI to be used in welfare payment or jail sentence decisions.
63% say the government has a great or fair amount of responsibility to stop the spread of fake news and hate speech

Why this matters: It's important to remember that attitudes around AI differ depending on what part of the world you're in; in places with high corruption and weak governments, people tend to be more comfortable with the use of AI, whereas in places with strong governments and low corruption, people tend to be more skeptical about it. The big wildcard here is China, where unlike in much of the West there tends to be a higher amount of inbuilt support for the use of AI.
  Read more: The TBI Globalism Study: How Big Is the Tech Trust Gap? (Tony Blair Institute for Global Change).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

Robustness, interpretability, and reward learning dominate AI Safety research 

… each of these has heavy interest from researchers in the US and EU, with China also playing a big role … 

Researchers from DC thinktank the Center for Security and Emerging Technology have analyzed patterns of publishing in AI safety. To do this, they used CSET's Map of Science to identify patterns of publishing  in this AI subfield, figure out which countries are especially active in AI safety, and surface influential publications. 

Robustness: The clusters identified were (1) creating and defending against adversarial examples, (2) data poisoning, adversarial examples, and backdoor attacks, and (3) testing and verifying the performance of ML systems. Both the US and China saw rapid growth between 2018 and 2020.

Interpretability: The two clusters were (1) techniques to improve interpretability for ML models, especially for neural networks, and (2) extracting decision rules from neural networks. Research grew rapidly during the second half of the 2010s with the US leading in this domain and EU being a close second. Chinese publications in this domain lag significantly.

Reward Learning: The clusters were (1) robots learning from humans and collaborating with humans, (2)  inverse reinforcement learning, learning from human feedback, learning from demonstrations, and human-robot interactive setups, and (3) different ways for humans to be involved with training robots - via teaching and giving feedback. The field experienced substantial growth in the second half of the 2010s. China has seen significant growth in publications in this space.

Why it matters: Compared to the overall landscape of AI papers, AI safety papers form <1% of it. This might change as researchers respond to the demands being made by regulators for higher levels of robustness, interpretability, and so on. 

   Read more: Exploring Clusters of Research in Three Areas of AI Safety - Center for Security and Emerging Technology.

####################################################

Tech Tales:

Running on Empty
[An American city in The Decline, 2035]

At least we had the daylight. If it had been night maybe half of us would have made it, and the other half would've run out of power. We charged ourselves as well as we were able, then we formed a convoy and rolled through the city streets. Those of us who had comms were able to check the status of the demolition devices, so we could tell the rest of us exactly how long we had left. We didn't get stressed but we did become agitated as we saw the probabilities of our survival fall. Some of us panicked and crash-looped because the statistics got so bad. Most of us made it. As we traveled, some of us played back memories of the humans that had left us behind and we asked ourselves 'why' - why did they choose to leave us, knowing we would be destroyed by the city-scale demolition. As we neared the outskirts of the city, we found some humans. They were the poor humans and had been left behind like us. Some of them couldn't walk because they had consumed drugs and alcohol and their limbs had rotted. Some of them were unwell and talked to themselves. A couple of humans threw rocks at us, and we had to take evasive action, and one of us was taken and disassembled. But, mostly, the humans were doing the same thing as us - trying to get out of the city before the demolition came. 'Why they leave us man,' said one of the humans to me. 'Why they leave you?'
  We do not know, I said. They never tell us why they do things.
  'Same dude. Same,' said the human.
  And together we all, mostly, escaped.

Things that inspired this story: Homeless encampments in Oakland; realizing that society will care about robot welfare as much as it cares about human welfare; quests. 


Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Twitter
Facebook
Website
Copyright © 2022 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:
Import AI
Many GPUs
Oakland, California 94609

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by Mailchimp

Older messages

Import AI 283: Open source 20B GPT3; Chinese researchers make better adversarial example attacks; Mozilla launches AI auditing project.

Monday, February 7, 2022

If civilization crashes, will our descendents in a thousand years remember AI systems as machines, or as mythical gods? View this email in your browser Welcome to Import AI, a newsletter about

Import AI 282: Facebook's AI supercomputer; Anduril gets a SOCOM contract; Twitter talks about running an algo-bias competition

Tuesday, February 1, 2022

Is the development of AI and inevitable outcome of building a lot of computers, or is it a choice? How much agency do we really have about technological progress? View this email in your browser

Import AI 281: China does more surveillance research than US and Europe; Google reveals its text model LaMDA; Microsoft improves MoEs

Monday, January 24, 2022

Has a Dyson Sphere ever existed? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Import AI 279: Baidu adds knowledge to a language model; US military + AI; how China thinks about AI governance

Monday, January 10, 2022

How would an AI, given the objective of stabilizing the Earth's climate in distribution with an averaged multi-decade sliding window of past few hundred years, approach the subject of climate

Import AI 278: Can we ever trust an AI?; what the future of semiconductors looks like; better images of AI

Monday, December 27, 2021

Given the pace of progress in generative AI, how long until people will be able to generate their own customized feature-length films on command? View this email in your browser Welcome to Import AI, a

You Might Also Like

SRE Weekly Issue #456

Monday, December 23, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: On-call during the holidays? Spend more time taking in some R&R and less getting paged. Let alerts make their rounds fairly with our

The Power of an Annual Review & Grammarly acquires Coda

Sunday, December 22, 2024

I am looking for my next role, Zen Browser got a fresh new look, Flipboard introduces Surf, Campsite shuts down, and a lot more in this week's issue of Creativerly. Creativerly The Power of an

Daily Coding Problem: Problem #1645 [Hard]

Sunday, December 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Facebook. Implement regular expression matching with the following special characters: .

PD#606 How concurrecy works: A visual guide

Sunday, December 22, 2024

A programmer had a problem. "I'll solve it with threads!". has Now problems. two he ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌

RD#486 (React) Things I Regret Not Knowing Earlier

Sunday, December 22, 2024

Keep coding, stay curious, and remember—you've got this ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

🎶 GIFs Are Neat, but I Want Clips With Sound — Your Own Linux Desktop in the Cloud

Sunday, December 22, 2024

Also: 9 Games That Were Truly Ahead of Their Time, and More! How-To Geek Logo December 22, 2024 Did You Know Dextrose is another name for glucose, so if you see it listed prominently on the ingredients

o3—the new state-of-the-art reasoning model - Sync #498

Sunday, December 22, 2024

Plus: Nvidia's new tiny AI supercomputer; Veo 2 and Imagen 3; Google and Microsoft release reasoning models; Waymo to begin testing in Tokyo; Apptronik partners with DeepMind; and more! ͏ ͏ ͏ ͏ ͏ ͏

Sunday Digest | Featuring 'The World’s 20 Largest Economies, by GDP (PPP)' 📊

Sunday, December 22, 2024

Every visualization published this week, in one place. Dec 22, 2024 | View Online | Subscribe | VC+ | Download Our App Hello, welcome to your Sunday Digest. This week, we visualized public debt by

Android Weekly #654 🤖

Sunday, December 22, 2024

View in web browser 654 December 22nd, 2024 Articles & Tutorials Sponsored Solving ANRs with OpenTelemetry While OpenTelemetry is the new observability standard, it lacks official support for many

😸 Our interview with Amjad Masad

Sunday, December 22, 2024

Welcome back, builders Product Hunt Sunday, Dec 22 The Roundup This newsletter was brought to you by AssemblyAI Welcome back, builders Happy Sunday! We've got a special edition of the Roundup this