Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.
Want a 20B parameter GPT-style language model? Go here!
…Eleuther releases the largest public open source AI model…
Last week, we wrote about how Eleuther was about to release a 20B parameter language model. Now, they have.
Get the model here (Eleuther, GitHub).
Read the research paper: GPT-NeoX-20B: An Open-Source Autoregressive Language Model (PDF).
####################################################
Want a language model that actually knows about COVID? You might need a Diachronic model:
…Models trained on newer data do better - try them yourself…
Researchers with the University of Porto, Snap Inc., and Cardiff NLP have built a family of so-called 'time-aware' BERT-style language models, trained on Twitter data. The craziest part of this is that they're committing to "keep updating and releasing a new model every three months, effectively enabling the community to make use of an up-to-date language model at any period in time".
What the problem is: Most language models are trained on a dataset, then never updated. That means that some language models might have no knowledge of minor events like the global COVID pandemic. This is obviously a problem and the solution is simple (albeit labor-intensive) - periodically gather new data and re-train models.
What they did: They train a base RoBERTa model using Twitter data that cuts off in 2019, made up of 90 million tweets. Then, for every three months that elapses after that, they add 4.2 million tweets into the dataset and train a new model. At the time of writing, they've trained nine models in total, with the latest model (2021-Q4) being trained on 123.86 million tweets. The theory is that newer models should perform better on more modern tasks and evaluations.
How well does it do? They compare their models against a few baselines, including BERTweet (which was trained on ~900m tweets). In tests, their models beat BERTweet on six out of seven benchmarks, though BERTweet gets the best overall performance. These aren't strictly 'time-aware' evaluations, though; they just test some classification abilities for things like emotions, irony, stance, and so on. In these time-aware tests, they find that pseudo-perplexity (PPPL) tends to increase by about 10% for each year by which the models are out of date (so the models get 10% less good and appropriate in terms of the text they generate). ". This result reinforces the need for updated language models even for short time periods," the researchers write.
Why this matters: AI models naturally freeze-dry the cultural landscape they're trained on, meaning that if we don't get good at updating our models, we'll end up trapped in a world where many of our AI systems are outputting things relevant to prior eras and cultural trends - this will make them less useful, and holds the potential for creating feedback loops around cultural stagnation. AI models are weird mirrors of society, so we need to remake them as society changes.
Read more: TimeLMs: Diachronic Language Models from Twitter (arXiv).
Get the models here (Cardiff NLP, Twitter).
####################################################
U.S. Army gets smart, semi-autonomous personal drones:
…Skydio gets a $20m a year contract..
Skydio, the company that makes drones which can navigate themselves semi-autonomously, has gained a five-year contract with the U.S. Army, worth up to $99.8m over five years. Skydio was selected as part of the Army's procurement initiative around small, personal drones - the Short Range Reconnaissance (SRR) Program of Record. Skydio was chosen after the Army evaluated 30 small drone vendors. "Skydio drones deliver unparalleled situational awareness and ease of use in the most demanding situations thanks to Skydio Autonomy," said Skydio CEO, Adam Bry, in a press release.
Things that start out as toys become weapons: Skydio started as a drone advertized for sports enthusiasts who wanted a drone that could follow and film them as they ran around, snowboarded, hiked, climbed cliffs, or any other high-octane Type A personality activity. It's funny how after a few years of development, the company is now getting into the military. Many toys for rich people ultimately become weapons (and vice versa).
Why this matters: For many years, militaries have been centaurs - collectives of humans and machines working together. This has mostly taken the form at high levels of abstractions; satellites provide information to people managing teams, or teams of humans use bomb-disposal robots to deal with IEDs. With things like the Skydio contract, we're entering the era of the personal centaur - small groups of soldiers, or even individuals, will have their own little machine emissaries with which to conduct operations.
Read more: U.S. Drone Maker Skydio Wins Production Other Transaction (OT) Agreement for U.S. Army Short Range Reconnaissance Program (Skydio).
####################################################
Simulators are the new platforms: Waabi unveils a self-driving car sim:
…Raquel Urtasun's startup wants to build a business on simulators…
Waabi, a self-driving car startup run by the former head of Uber's self-driving research team, Raquel Urtasun, has announced 'Waabi World', a simulator for training self-driving cars.
Distinguishing features: Waabia claims it is "the most scalable, highest fidelity closed-loop simulator ever" (I somehow doubt Tesla or Waymo would agree, but hey, they're not talking about their sims!). The simulator has four main features:
- High fidelity world simulation: Uses AI to reconstruct real-world geometry, appearance, and material properties.
- High-fidelity sensor simulation: Uses AI and physics-based rendering "to simulate realistic sensor data in near real-time".
- Automatic stress-testing: Automatically generates challenging traffic scenarios to test out the simulated cars against.
- Reinforcement learning: Waabi uses RL to update the car agents so they can learn to drive in the simulation. (There's some very fluffy writing here and it doesn't say RL anywhere, but that's what I infer.)
Why this matters: Waabi seems like a decent simulator that is mostly interesting because it's public, versus the private simulators operated by other self-driving car ventures. What'll be fascinating is if Waabi can actually out-compete its rivals who have more vehicles, bigger computers, and better data. Perhaps a good simulator can provide an edge?
Read more: Welcome to Waabi World (Waabi website).
Read more: How Waabi World works (Waabi website).
####################################################
How do algorithmic impact audits work in the real world? Here's an NHS example:
…UK's healthcare behemoth gets advice from the Ada Lovelace Institute…
UK thinktank the Ada Lovelace Institute has written a detailed proposal for conducting an algorithmic impact assessment for data access in a healthcare context. Algorithmic impact assessments are a method to assess the potential societal impact of an AI system in advance of its deployment, and to identify ways to continuously monitor the system for these impacts once deployed.
Seven steps for an algorithm impact assessment: The Ada Lovelace Institute identifies seven steps that the UK's National Health Service (NHS) should go through, before it gives people access to the National Medical Imaging Platform (NMIP) - a vast repository of digitized medical data.
1. What do we want to do: People who want to access the NMIP should outline the prupose, scope, and intended use of the system they'll build.
2. Filtering: The NMIP should filter these applications according to its own criteria.
3. Problem brainstorming: Successful applicants should attend a workshop where they try and think through the harm and benefit scenarios that could come out of NMIP access.
4. Rewrite: People should rewrite 1) to incorporate insights from 3) and re-submit it.
5. Decision: NMIP decides whether to grant access to the people who want access.
6. The impact assessments are published on a website.
7. Revision: The assessments get revised as the underlying algorithms change (e.g, if a model has been significantly iterated upon).
Why this matters: AI is in a 'state of nature' when it comes to AI regulation - there's almost no regulation, the landscape is full of all kinds of weird entities (some of which are predators), and there isn't any real system that governs them. Things like the Ada Lovelace guide for an impact assessment are one way to bring sense to this world.
Read more: Algorithmic impact assessment: a case study in healthcare (Ada Lovelace Institute).
####################################################
What do people in 26 countries think about AI?
…Tony Blair Institute survey gives us a sense of the 'vibe' re: AI right now…
The Tony Blair Institute has surveyed people in 26 countries (including: Russia, Great Britain, and Saudi Arabia) and the results are quite counterintuitive.
Results highlights:
- 60% of people surveyed "support the use of AI for selected policing and medical applications", though there's variation across developing and emerging markets; in developed countries, fewer people want AI to be used in welfare payment or jail sentence decisions.
- 63% say the government has a great or fair amount of responsibility to stop the spread of fake news and hate speech
Why this matters: It's important to remember that attitudes around AI differ depending on what part of the world you're in; in places with high corruption and weak governments, people tend to be more comfortable with the use of AI, whereas in places with strong governments and low corruption, people tend to be more skeptical about it. The big wildcard here is China, where unlike in much of the West there tends to be a higher amount of inbuilt support for the use of AI.
Read more: The TBI Globalism Study: How Big Is the Tech Trust Gap? (Tony Blair Institute for Global Change).
####################################################
AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute
Robustness, interpretability, and reward learning dominate AI Safety research
… each of these has heavy interest from researchers in the US and EU, with China also playing a big role …
Researchers from DC thinktank the Center for Security and Emerging Technology have analyzed patterns of publishing in AI safety. To do this, they used CSET's Map of Science to identify patterns of publishing in this AI subfield, figure out which countries are especially active in AI safety, and surface influential publications.
Robustness: The clusters identified were (1) creating and defending against adversarial examples, (2) data poisoning, adversarial examples, and backdoor attacks, and (3) testing and verifying the performance of ML systems. Both the US and China saw rapid growth between 2018 and 2020.
Interpretability: The two clusters were (1) techniques to improve interpretability for ML models, especially for neural networks, and (2) extracting decision rules from neural networks. Research grew rapidly during the second half of the 2010s with the US leading in this domain and EU being a close second. Chinese publications in this domain lag significantly.
Reward Learning: The clusters were (1) robots learning from humans and collaborating with humans, (2) inverse reinforcement learning, learning from human feedback, learning from demonstrations, and human-robot interactive setups, and (3) different ways for humans to be involved with training robots - via teaching and giving feedback. The field experienced substantial growth in the second half of the 2010s. China has seen significant growth in publications in this space.
Why it matters: Compared to the overall landscape of AI papers, AI safety papers form <1% of it. This might change as researchers respond to the demands being made by regulators for higher levels of robustness, interpretability, and so on.
Read more: Exploring Clusters of Research in Three Areas of AI Safety - Center for Security and Emerging Technology.
####################################################
Tech Tales:
Running on Empty
[An American city in The Decline, 2035]
At least we had the daylight. If it had been night maybe half of us would have made it, and the other half would've run out of power. We charged ourselves as well as we were able, then we formed a convoy and rolled through the city streets. Those of us who had comms were able to check the status of the demolition devices, so we could tell the rest of us exactly how long we had left. We didn't get stressed but we did become agitated as we saw the probabilities of our survival fall. Some of us panicked and crash-looped because the statistics got so bad. Most of us made it. As we traveled, some of us played back memories of the humans that had left us behind and we asked ourselves 'why' - why did they choose to leave us, knowing we would be destroyed by the city-scale demolition. As we neared the outskirts of the city, we found some humans. They were the poor humans and had been left behind like us. Some of them couldn't walk because they had consumed drugs and alcohol and their limbs had rotted. Some of them were unwell and talked to themselves. A couple of humans threw rocks at us, and we had to take evasive action, and one of us was taken and disassembled. But, mostly, the humans were doing the same thing as us - trying to get out of the city before the demolition came. 'Why they leave us man,' said one of the humans to me. 'Why they leave you?'
We do not know, I said. They never tell us why they do things.
'Same dude. Same,' said the human.
And together we all, mostly, escaped.
Things that inspired this story: Homeless encampments in Oakland; realizing that society will care about robot welfare as much as it cares about human welfare; quests.
Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf
|