Import AI 252: Gait surveillance; a billion Danish words; DeepMind makes phone-using agents

As off-the-shelf AI advances, the potential for emergence increases; in a few years, perhaps some of our most impactful AI systems will be assembled at home by hobbyists out of pre-built components. 
View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Synthetic data works for gait generation as well (uh oh):
...Generating movies of 10,000 fake people walking, then using them for surveillance…
Gait detection is the task of identifying a person by the gait they walk with. Now, researchers with Zhejiang University in China have built VersatileGait, a dataset of 10,000 simulated individuals walking, with each 44 distinct views available for each individual. The purpose of VersatileGait is to augment existing gait datasets collected from reality. In tests, the researchers show the synthetic data can be used as an input for training gait-detection systems which subsequently get used in the real world.

What they used: To build this dataset, they used an open source tool called 'Make Human' to generate different character models, collected 100 walking animations from a service called 'Mixamo', then animated various permutations of characters+walks in the game engine Unity3D.

Synthetic data and ethics: "Since all of our data are collected by computer simulation, there will be no problems for privacy preservation. Therefore, our dataset is in agreement with the ethics of research and has no risks for use," the authors write.

Why this matters: Being able to automatically surveil and analyze people is one of those AI capabilities that will have a tremendous impact on the world and (excluding computer vision for facial recognition) is broadly undercovered by pretty much everyone. Gait recognition is one of the frontier areas for the future of surveillance - we should all pay more attention to it.
  Read more: VersatileGait: A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation (arXiv).

###################################################

Care about existential risk? Apply to be the Deputy Director at CSER (UK):
The Centre for the Study of Existential Risk, a Cambridge University research center, is hiring a deputy director. "We’re looking for someone with strong experience in operations and strategy, with the interest and intellectual versatility to engage with and communicate CSER’s research. The role will involve taking full operational responsibility for the day-to-day activities of the Centre, including people management and financial management, and contributing to strategic planning for the Centre," I'm told. The deadline for applications is Sunday July 4th.
  Find out more and apply here (direct download PDF).

###################################################

DeepMind wants to teach AI agents to use Android phones:
...AndroidEnv is an open source tool for creating phone-loving AIs…
DeepMind has released AndroidEnv, a software program that lets you train AI agents to solve tasks in the 'Android' phone operating system. To start with, DeepMind has shipped AndroidEnv with 100 tasks across 30 applications, ranging from playing games (e.g, 2048, Solitaire), to navigating the user interface to set a time.

AndroidEnv lets "RL agents interact with a wide variety of apps and services commonly used by humans through a universal touchscreen interface". And because the agents train on a realistic simulation of Android, they can be deployed on real devices once trained, DeepMind says.

Strategic games! DeepMind is also working with the creators of a game called Polytopia to add it as a task for AndroidEnv agents. Polytopia is a game that has chewed up probably several tens of hours of my life over the years - it's a fun little strategy game which is surprisingly rich, so I'll be keen to see how AI agents perform on it.

Why this matters: Eventually, most people are going to have access to discrete AI agents, continually trained on their own data, and working as assistants to help them in their day-to-day lives. Systems like AndroidEnv make it easy to start training AI agents on a massively widely-used piece of software, which will ultimately make it easier for us to delegate more complex tasks to AI agents.
Read more: AndroidEnv: The Android Learning Environment (DeepMind).
Find out more: AndroidEnv: A Reinforcement Learning Platform for Android (arXiv).
Get the code: AndroidEnv - The Android Learning Environment (DeepMind, GitHub).

###################################################

Want to test your AI on a robot but don't have a robot? Enter the 'Real Robot Challenge' for NeurIPS 2021:
...Robot learning competition gives entrants access to a dexterous manipulator…
Robots are expensive, hard to program, and likely important to the future of AI. But the first two parts of that prior sentence tell you why we see relatively less AI stuff applied to robots, than to traditional software. For a few years, competition hosted by the Max Planck Institute for Intelligent Systems has tried to change this by giving people access to a real robot (a TriFinger), which they can run algorithms on.

What the competition involves: "Participants will submit their code as they would for a cluster, and it will then be executed automatically on our platforms. This will allow teams to gather hundreds of hours of real robot data with minimal effort," according to the competition website. "The teams will have to solve a series of tasks ranging from relatively simple to extremely hard, from pushing a cube to picking up a pen and writing. The idea is to see how far the teams are able to push, solving the most difficult tasks could be considered a breakthrough in robotic manipulation."

Key dates:June 23rd is the date for submissions for the first stage of the competition; successful entrants will subsequently get access to real robot systems.
Find out moreabout the competition here (Real Robot Challenge website).

###################################################

Detecting scorpions with off-the-shelf-AI:
...Argentinian researchers demonstrate how easy computer vision is getting…
Here's a fun and practical paper about using off-the-shelf AI tools to build an application that can classify different types of scorpions and tell the difference between dangerous and non-dangerous ones. The research was done by the Universidad Nacional de La Plata in Argentina, and saw researchers experiment with YOLO(v4) and MobileNet(v2) for the task of scorpion detection, while using the commercial service 'Roboflow' for data augmentation and randomization. They're ultimately able to obtain accuracies of 88% and 91% across the YOLO and MobileNet methods, and recall values of 90% and 97%, respectively.

Why this matters: Papers like this highlight how people are doing standard/commodity computer vision tasks today. What I found most surprising was the further evidence that primitives like YOLO and MobileNet are sufficiently good they don't need much adaptation, and that academics are now starting to use more commercial services to help them in their research (e.g, you could do what Roboflow does yourself but… why would you? It doesn't cost that much and maybe it's better than ImageMagick etc).
Read more: Scorpion detection and classification systems based on computer vision and deep learning for health security purposes (arXiv).

###################################################

A Danish billion-word corpus appears:
...the Danish Gigaword Corpus will make it easier to train GPT2-style models to reflect digitzed Danish culture...
Researchers with the IT University of Copenhagen have built the Danish Gigaword Corpus, which consists of 1045million (1.05billion) Danish words, drawn from sources ranging from Danish social media, to law ands tax codes, to Wikipedia, literature, news, and more. The corpus is licsened via the Creative Commons general license (CC0) and CC-BY.

Why this matters: "In Denmark, natural language processing is nascent and growing faster and faster," the authors write. "We hope that this concrete and significant contribution benefits anyone working with Danish NLP or performing other linguistic activities". More broadly, in AI, data does equate to representation - so now there's a billion-word nicely filtered dataset of Danish words available, we can expect more groups to train more Danish language models, translation models, and so on.
  Read more: Gigaword (official website).
Read the paper: The Danish Gigaword Corpus (PDF).

###################################################

Tech Tales:

The Religion Virus
[Worldwide, 2026]

It started as a joke from some Mormon comp. sci. undergrads, then it took over most of the computers at the university, then the computers of the other universities linked to the high-speed research infrastructure, then it spread to the internet. Now, we estimate more than a million person years of work have been expended trying to scrub the virus off of all the computers it has found. We estimate we're at 80% containment, but that could change if it self-modifies again.

As a refresher, the virus - dubbed True Believer - is designed to harvest the cycles of both the machines it deploys onto and the people that use those machines. Specifically, once it takes over a machine it starts allocating a portion of the computer's resources to onward propagating the virus (normal), as well as using computational cycles to train a large multilingual neural net on a very large dataset of religious texts (not normal). The only easy way to turn the virus off is to activate the webcam on the computer, then it'll wait to see if a human face is present; if the face is present, the virus starts showing religious texts and it uses some in-virus eye-tracking software to check if the person is 'reading' the texts. If the person reads enough of the religious texts, the virus self-deletes in a way that doesn't harm the system. If you instead try to remove the virus manually, it has a variety of countermeasures, most of which involve it wiping all data on the host computer.

So that's why, right now, all around the world, we've got technicians in data centers plugging webcams and monitors into servers, then sitting and reading religious texts as they sit, sweating, in the hot confines of their computer facilities. The virus doesn't care about anything but attention. And if you give it attention as a human, it leaves. If you give it attention as a computer, it uses your attention to replicate itself, and aid its own ability to further expand itself through training its distributed neural network.

Things that inspired this story: SETI@Home and Folding@Home if created by religiously-minded -people as a half-serious joke; thoughts about faith and what 'attention' means in the context of spirituality; playing around with the different ways theological beliefs will manifest in machines and in people.



Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Twitter
Facebook
Website
Copyright © 2021 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:
Import AI
Many GPUs
Oakland, California 94609

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by Mailchimp

Older messages

Import AI 251: Korean GPT-3; facial recognition industrialization; faking fingerprints with GANs

Monday, May 31, 2021

Plus: a short story about a robot imagining another robot Will the 'personality' types of AI systems be more or less varied than the types of the people that create them? View this email in

Import AI 250: Facebook's TPU; Twitter analyzes its systems for bias; encouraging proof about federated learning

Monday, May 24, 2021

Is symbolic AI the 'dark matter' of AI - there's tons of it deployed around us and we can't measure it. Or is it far more insubstantial? And how could we know the truth? View this email

Import AI 249: IBM's massive code dataset; dataset archaeology: BookCorpus; Facebook wants computers to read the world

Monday, May 17, 2021

There are more than six thousand languages used in the world today - how many languages might AI systems evolve to communicate with another? View this email in your browser Welcome to Import AI, a

Import AI 247: China makes its own GPT3; the AI hackers have arrived; four fallacies in AI research.

Monday, May 3, 2021

How might different alien intelligences conceive of AI? If - or perhaps when - we meet aliens, will they have also developed things that seem like neural networks? How much diversity is possible in the

Import AI 246: Generating data via game engines; the FTC weighs in on AI fairness; Waymo releases a massive self-driving car dataset.

Monday, April 26, 2021

In the same way 'just-in-time' manufacturing revolutionized global capitalism, how much 'just-in-time' automatic data gathering speed up the OODA loop of model development and

You Might Also Like

Christmas On Repeat 🎅

Monday, December 23, 2024

Christmas nostalgia is a hell of a drug. Here's a version for your browser. Hunting for the end of the long tail • December 22, 2024 Hey all, Ernie here with a refresh of a piece from our very

SRE Weekly Issue #456

Monday, December 23, 2024

View on sreweekly.com A message from our sponsor, FireHydrant: On-call during the holidays? Spend more time taking in some R&R and less getting paged. Let alerts make their rounds fairly with our

The Power of an Annual Review & Grammarly acquires Coda

Sunday, December 22, 2024

I am looking for my next role, Zen Browser got a fresh new look, Flipboard introduces Surf, Campsite shuts down, and a lot more in this week's issue of Creativerly. Creativerly The Power of an

Daily Coding Problem: Problem #1645 [Hard]

Sunday, December 22, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Facebook. Implement regular expression matching with the following special characters: .

PD#606 How concurrecy works: A visual guide

Sunday, December 22, 2024

A programmer had a problem. "I'll solve it with threads!". has Now problems. two he ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌ ͏ ‌

RD#486 (React) Things I Regret Not Knowing Earlier

Sunday, December 22, 2024

Keep coding, stay curious, and remember—you've got this ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

🎶 GIFs Are Neat, but I Want Clips With Sound — Your Own Linux Desktop in the Cloud

Sunday, December 22, 2024

Also: 9 Games That Were Truly Ahead of Their Time, and More! How-To Geek Logo December 22, 2024 Did You Know Dextrose is another name for glucose, so if you see it listed prominently on the ingredients

o3—the new state-of-the-art reasoning model - Sync #498

Sunday, December 22, 2024

Plus: Nvidia's new tiny AI supercomputer; Veo 2 and Imagen 3; Google and Microsoft release reasoning models; Waymo to begin testing in Tokyo; Apptronik partners with DeepMind; and more! ͏ ͏ ͏ ͏ ͏ ͏

Sunday Digest | Featuring 'The World’s 20 Largest Economies, by GDP (PPP)' 📊

Sunday, December 22, 2024

Every visualization published this week, in one place. Dec 22, 2024 | View Online | Subscribe | VC+ | Download Our App Hello, welcome to your Sunday Digest. This week, we visualized public debt by

Android Weekly #654 🤖

Sunday, December 22, 2024

View in web browser 654 December 22nd, 2024 Articles & Tutorials Sponsored Solving ANRs with OpenTelemetry While OpenTelemetry is the new observability standard, it lacks official support for many