Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.
Train first-person AI systems with EPIC-Kitchens:
...Expanded dataset gives researchers a bunch of egocentric films to train AI systems against…
Researchers with the University of Bristol and the University of Catania have introduced EPIC-KITCHENS-100, a dataset of first-person perspective videos of people doing a range of things like cooking or washing. The interesting thing about the dataset is that the videos are accompanied with narratives, which are the participants describing their actions as they record them. This means the dataset comes along with rich annotations developed in an open-ended format.
Dataset details:
- 100 hours of recording
- 20 million frames
- 45 kitchens in four cities
- 90,000 distinct action segments
What can EPIC test? The EPIC dataset can help researchers test out AI systems against a few AI challenges, including:
- Action recognition (e.g, figuring out if a video clip contains a given action)
- Action detection (e.g, looks like they are washing dishes at this point in a long video clip)
- Action anticipation (e.g, looks like someone is about to start washing dishes)
- Action generalization (can you figure out actions in these videos, via pre-training on some other videos?)
Why this matters: Egocentric video doesn't get as much research exploration as third-person video, likely as a consequence of the availability of data (there's a lot of third-person video online, but relatively little egocentric video). Making progress on this will make it easier to build embodied robots. It'll also let us build better systems for analyzing video uploaded to social media - recall how the Christchurch shooting videos posed a challenge to Facebook algorithms because of its first-person perspective, which its systems hadn't seen much of before.
Read the research paper: Rescaling Egocentric Vision (arXiv).
Get the data from here: available July 1st (EPIC-Kitchens-100 GitHub website).
####################################################
China's Didi plans million-strong self-driving fleet:
...But don't hold your breath…
Chinese Uber-rival Didi Chuxing says it plans to deploy a million self-driving vehicles by 2030, according to comments reported by the BBC. That seems like a realistic goal, especially compared to the more ambitious proclamations from other companies. (Tesla said in April 2019 it planned to have a million self-driving 'robotaxis' on the road within a year to a year and three months - this has not happened).
Read more: Didi Chuxing: Apple-backed firm aims for one million robotaxis (BBC News).
####################################################
NVIDIA and Mercedes want to build the upgradable self-driving car:
...Industry partnership tells us about what really matters in self-driving cars…
What differentiates cars from each other? Engines? Non-strategic (and getting simpler, thanks to the switchover to electric cars). Speed? Cars are all pretty fast these days. Reliability? A non-issue with established brands, and also getting easier as we build electric cars.
Computers? That might be an actual differentiator. Especially as we head into a world full of self-driving cars. For this reason, NVIDIA and Mercedes-Benz have announced a partnership where, starting in 2024, NVIDIA's self-driving car technology will be rolled out across the car company's next fleet of vehicles. The two firms plan to collaborate on self-driving features like smart cruise control and lane changing, as well as automated parking. They're also going to try and do some hard self-driving stuff as well - this includes the plan to "automate driving of regular routes from address to address," according to an NVIDIA press release. "It is so exciting to see my years of research on a cockpit AI that tracks drivers’ face and gaze @nvidia be a part of this partnership, writes one NVIDIA researcher on Twitter.
The upgradeable car: Similar to Tesla, Mercedes plans to offer over-the-air updates to its cars, letting customers buy more intelligent capabilities as time goes on.
Why this matters: If the 20th century was driven by the harnessing of oil and petroleum byproducts, then there's a good chance the 21st century (or at least the first half) will be defined by our ability to harness computers and computer by-products. Partnerships like NVIDIA and Mercedes highlight how strategic computers are seen to be by modern companies, and suggests the emergence of a new scarce resource in business - computational ability.
Read more: Mercedes-Benz and NVIDIA to Build Software-Defined Computing Architecture for Automated Driving Across Future Fleet (NVIDIA newsroom).
####################################################
Coming soon: Kiwi Fruit-Harvesting Robots
...But performance is still somewhat poor…
One promising use case for real world robots is in the harvesting of fruits and vegetables. To be able to build useful machines here, some problems require better computer vision techniques so that our machines can see what they need to gather. Researchers with the University of Auckland, New Zealand, have built a system that can analyze a kiwifruit orchard and pick out individual kiwis automatically via semantic segmentation.
The score: The model gets a score of 87% recall at detecting non-occluded kiwi fruits, and a score of 30% for occluded ones. It gets around 75% recall and 92% precision overall. The authors used a small dataset of 63 labeled pictures of kiwifruit in orchards. By comparison, a Faster-R-CNN model trained with 100X the amount of data a couple of years ago got a recall of 96.7 and precision of 89.3 (versus 92 here), suggesting their semantic segmentation approach has helped them get slightly better performance.
Sidenote: I love this genre of research papers: identify a very narrow task / problem area, then build a specific model and/or dataset for this task, then publish the results. Concise and illuminating.
Read more: Kiwifruit detection in challenging conditions (arXiv).
####################################################
DeepMind expands its software for robot control:
...Run, simulated dog, run!...
DeepMind has published the latest version of the DeepMind Control Suite, dm_control. This software gives access to a MuJoCo-based simulator for training AI systems to solve continuous control problems, like figuring out how to operate a complex multi-jointed robot in a virtual domain. "It offers a wide range of pre-designed RL tasks and a rich framework for designing new ones," DeepMind writes in a research paper discussing the software.
Dogs, procedural bodies, and more: dm_control includes a few tools that aren't trivially available elsewhere. These include: a 'Phaero Dog' model with 191 total state dimensions (making it very complex), as well as a quadruped robot, and a simulated robot arm. The software also includes support for PyMJCF - a language to let people procedurally compose new simulated entities that they can try and train AI systems to control.
Things that make you go 'hmmm': In a research writeup, DeepMind also discusses a dog-grade complexity Rodent model, which it says it has developed "in order to better compare learned behaviour with experimental settings common in the life sciences".
Why this matters: Simulators are one of the key ingredients for AI research - they're basically engines for generating new datasets for complex problems, like learning to operate multi-jointed bodies. Systems like dm_control give us a sense of progression in the field (it was only a few years ago that most people were working on 2D simulated robot arms with something on the of 7 dimensions of control - now we're working on dogs with more than 20 times that number), as well as indicating something about the future - get ready to see lots of cute videos of simulated skeletal animals running, jumping, and dodging.
Read more: dm_control: Software and Tasks for Continuous Control (DeepMind website).
Watch the dog run here: Control Suite dog domain (YouTube).
Get the code and read about the updates (GitHub).
Check out the research publication: dm_control: Software and Tasks for Continuous Control (arXiv).
####################################################
Using convnets to count refugees:
…Drone imagery + contemporary AI techniques = scalable humanitarian earth monitoring...
Researchers with Johns Hopkins University Applied Physics Laboratory, the University of Kentucky, the Center for Disease Control and Prevention (CDC), and the Agency for Toxic Substances and Disease Registry, have put together a new dataset to train systems to estimate refugee populations from overhead images. "Our approach is the first to perform learning-based population estimation using sub-meter overhead imagery (10cm GSD)," they write. "We train a model using aerial imagery to directly predict camp population at high spatial resolution". They're able to train a system that gets a 7% mean population estimation error on their dataset - promising performance, though not yet at the level necessary for real world deployment.
The dataset: The dataset consists of overhead drone-gathered imagery of 34 refugee camps in Bangladesh, taken over the course of two years. It fuses this data with 'population polygons' - data taken from routine International Organization for Migration (IOM) site assessments, as well as OpenStreetMap structure segmentation masks for buildings.
Why this matters: Note the inclusion of drone-gathered data here - that, to me, is the smoking gun lurking in this research paper. We're starting to be able to trivially gather large-scale, current imagery from different parts of the world. Papers like this show how our existing relatively simple AI techniques can already take advantage of this data to do stuff of broad social utility, like estimating refugee populations.
Read more: Estimating Displaced Populations from Overhead (arXiv).
Get the code here: Tools for IDP and Refugee Camp Analysis (GitHub).
Get the dataset here: Cox's Bazar Refugee Camp Dataset (GitHub).
####################################################
AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…
White House imposes new immigration restrictions:
President Trump has signed an executive order suspending certain foreign workers from entering the US, including H1-B visas widely used by tech firms. One estimate finds the move will prevent 219,000 temporary workers and 158,000 green card applicants this year — an enormous reduction in skilled migration to the US.
AI and foreign talent: Georgetown’s Center for Security and Emerging Technology (CSET) has been doing some excellent research on the importance of foreign talent for the US AI sector, and argue that restrictive immigration policies threaten to undermine US AI progress in the long-term. More than half of Silicon Valley start-ups have at least one immigrant amongst their founders, and demand for AI talent is expected to outpace domestic supply for the foreseeable future. They recommend improving visa options for temporary workers by increasing the cap on H1-B visas, reducing wait times, offering year-round decisions, and, importantly, expanding options for students and temporary workers to gain permanent residency in the US.
Matthew’s view: The broad economic case against these sorts of restrictions is well-established. It is nonetheless valuable to highlight the particular importance of foreign workers to technology and AI, since these sectors will be crucial to the health of the US in the mid-term. Alongside the more remote harms of unrealized growth and business disruption, these measures will cause enormous suffering for hundreds of thousands of people hoping to build lives in the US, and cast uncertainty on the plans and aspirations of many more.
Read more: Trump administration extends visa ban to non-immigrants (AP)
Read more: Immigration policy and global cooperation for AI talent (CSET)
Read more: Immigration Policy and the U.S. AI Sector (CSET)
GPT-3 writes creative fiction:
Gwern has been using OpenAI’s GPT-3 API to write fiction and is building a wonderful collection of examples, alongside some insightful commentary about the model’s performance. He describes GPT-3 as having “eerie” learning capabilities, allowing the raw model to “tackle almost any imaginable textual task purely by example or instruction” without fine-tuning. The model can also generate writing that is “creative, witty, deep, meta, and often beautiful,” he writes.
Highlights: I particularly enjoyed GPT-3’s pastiches of Harry Potter in different styles. Having been prompted to write a parody in the style Ernest Hemingway, GPT-3 offers a number of others without any further prompts:
- Arthur Conan Doyle: Harry pushed at the swinging doors of the bookshop hard, and nearly knocked himself unconscious. He staggered in with his ungainly package, his cheeks scarlet with cold and the shame of having chosen the wrong month to go Christmas shopping.
- Ingmar Bergman: Tears filled Harry’s eyes. Sweat stood on his forehead, showing the pure torment, the agony he suffered. He hugged his knees to his chest, sobbing softly, eyes half shut.
- P.G. Wodehouse: "‘There was nothing out of the way, sir,’ said Harry in a hurt voice. ‘"Indeed,’ said the headmaster, turning his lorgnette precisely three-quarters of a millimeter to port.
Read more: GPT-3 Creative Fiction (Gwern).
Read more: OpenAI API.
####################################################
Tech Tales:
[2024: London skunkworks 'idea lab' facility funded by a consortium of Internet platforms, advertisers, and marketers]
Cool Hunting
Sam sat down to start his shift. He turned his computer monitor on, clicked "beginning evaluation session", then spent several hours staring at the outputs of an unfeasibly large generative model.
There was: A rat with cartoon anime eyes; a restaurant where the lamps were all oversized, crowding out diners at some tables; some bright purple jeans with odd patterns cut out in them; some landscapes that seemed to be made of spaceships turned into shelters on alien worlds; and so much more.
At the end of the shift, Sam had tagged four of the outputs for further study. One of them seemed like promising material for an under-consideration advertising campaign. Another output seemed like it could get some play in the art market. These determinations were made by people assisted by large-scale prediction engines, which looked at the mysterious outputs and compared them to what they had seen in trending contemporary culture, then made a determination about what to do with them.
The nickname for the place was the cool factory. Before he'd worked there he had all these fantasies about what it would be like: people running down corridors, shouting at eachother about what the machine had produced; visits from mysterious artists who would gaze in wonder at the creations of the machines and use the inspiration to make transformative human art; politicians and politicians' aides making journeys to understand the shape of the future; and so on.
After he got there he found it was mostly a collection of regular people, albeit protected by fairly elaborate security systems. They got paid a lot of money to figure out what to do with the outputs of the strange, generative models. Sam's understanding of the model itself was vague - he'd read some of the research papers about the system and even waded into some of the controversies and internet-rumors about it. But to him it was a thing at the other end of a computer that produced stuff he should evaluate, and nothing more. He didn't ask many questions.
One day, towards the end of his shift, an image flashed up on screen: it was controversial, juxtaposing some contemporary politicians with some of the people in society that they consistently wronged. Something about it made him feel an emotional charge - he kept looking at it, unwilling to make a classification that would shift it off of his screen. He messaged some of his colleagues and a couple of them came over to his desk,
"Wow," one of them said.
"Imagine that on a billboard! Imagine what would happen!" said someone else.
They all looked at the image for a while.
"I guess this is why we work here," Sam said, before clicking the "FURTHER ANALYSIS" button that meant others at the firm would look at the material and consider what it could be used for.
At the end of his shift, Sam got a message from his supervisor asking him to come for a "quick five". He went to the office and his supervisor - a bland man with glasses, from which emanated a kind of potent bureaucratic power - asked him how his day went.
Sam said it went okay, then brought up the image he had seen towards the end.
"Ah yes, that," said his supervisor. "We're quite grateful you flagged that for us - it was a bug in the system, shouldn't have come to you in the first place."
"A bug?" Sam said. "I thought it was exciting. I mean, have you really looked at it? I can't think of anything else that seems that way. Isn't that what we're here for?"
"What we're here for is to learn," said his supervisor. "Learn and try to understand. But what we promote, that's a different matter. If you'd like to continue discussing this, I'd be happy to meet with you and Standards on Monday?"
"No, that'll be fine. But if there are other ways to factor in to these things, please let me know," Sam said.
"Will do," said the supervisor. "Have a great weekend!"
And so over the weekend Sam thought about what he had seen. He wrote about it in his journal. He even tried to draw it, so he'd remember it. And when he went back in on Monday, he saw more fantastical things, though none of them moved him or changed how he felt. He asked around about his supervisor in the office - at least as much as he could do safely - but never found out too much. Some other colleagues recounted some similar situations, but since they didn't have recordings of the outputs, and their ability to describe them was poor, he couldn't work out if there was anything in common.
"I guess there are some things that we like, but the people upstairs don't", said one of his colleagues.
Things that inspired this story: Large-scale generative models; cultural production and reproduction; the interaction of computational artifacts and capital; generative models used as turnkey creative engines; continued advances in the scale of AI models and their resulting (multi-modality) expressiveness.
Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf
|