Import AI 264: Tracking UAVs; Stanford tries to train big models; deepfakes as the dog ate my homework

A lot of the biggest users of AI are predominantly web entities that deal in the virtual rather than the physical (e.g, Google, Facebook). Though there are some applications of modern AI to robots, uptake is a lot smaller than with the web platforms. Is there some rough law by which AI diffuses onto software versus hardware platforms?

View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Here's a symptom of how AI is changing culture:
...Deepfakes show up as excuses…
Deepfakes are steadily percolating their way into society - the latest evidence of this is people using the very existence of the technology as a means to question the legitimacy of things they might have been recorded doing or saying. The latest example of this is an interview with someone in this excellent New Yorker piece about a coin called Skycoin. When someone was reached for comment about something they were recorded saying, they said it was "either a joke or a deep fake but probably a deep fake.”
Read more:Pumpers, Dumpers, and Shills: The Skycoin Saga (New Yorker).

####################################################

Stanford gets ready to train some large AI models:
...Though it's starting with just some GPT-2 scale things…
Something we write about a lot here at Import AI is power: who has it and who doesn't. Right now, the people who have the resources and engineering capabilities to build large models (e.g, GPT-3) have a lot of leverage in the space of AI development. Universities, by comparison, have less leverage as they don't build these models. Now, researchers with Stanford University are trying to change that with an open source tool called 'Mistral', which is meant to make it easier to train large language models.

What Mistral is: Mistral is "A framework for transparent and accessible large-scale language model training, built with Hugging Face". Along with releasing Mistral, the researchers have also released five medium GPT-2 and five small GPT-2 models, along with ten checkpoints of the models through training runs. That's kind of like a biologist giving you two sets of five petri dishes of similar organisms, where each of the petri dishes comes with detailed snapshots of the evolution of the entity in the petri dish over time. That's the kind of thing that can make it easier for people to research these technologies.
Get the code and model snapshots here:Mistral (Stanford CRFM GitHub).
Check out the talk in this (long) YouTube recording of the CRFM workshop, where some of the authors of this discussed their motivations for the models (CRFM webpage).

####################################################

1 GPU, 1 good simulator = efficient robot training:
...Plus: transatlantic robot manipulation…
Researchers with the University of Toronto, ETH Zurich, Nvidia, Snap, and MPI Tuebingen have built some efficient software for training a 3-finger robot hand. Specifically, they pair a simulator (NVIDIA's 'IsaacGym' with a low-cost robot hand (called a TriFinger, which is also the robot being used in the real robot challenge at NeurIPS 2021 #252).

What they did: "Our system trains using the IsaacGym simulator, we train on 16,384 environments in parallel on a single NVIDIA Tesla V100 or RTX 3090 GPU. Inference is then conducted remotely on a TriFinger robot located across the Atlantic in Germany using the uploaded actor weights", they write. Their best policy achieves a success rate of 82.5% - interesting performance from a research perspective, though not near the standards required for real world deployment.

Efficiency: They use an optimized version of the PPO algorithm to do efficient single GPU training, getting as inputs the camera pose (with noise) and position of the cube being manipulated. The output of the policy is a load of joint torques, and they train various permutations of the policy via using domain randomization to vary object mass, scale, and friction. They can pull 100k samples per second off of an Isaac simulation using a single RTX3090 GPU. It's not clear how generalizable this efficiency is - aka, is a lot of the efficiency here a ton of human-generated specific priors? It seems that way).
Code: "The codebase for the project will be released soon," they write.
Read more:Transferring Dexterous Manipulation from GPU Simulation to a Remote Real-World TriFinger (arXiv).
Check out a video about the research here (YouTube).

####################################################

How are we going to fight UAVs? By tracking them:
...Anti-UAV workshop tells us about the future of anti-drone tech…
Researchers with a range of Chinese institutions have held a workshop dedicated to tracking multiple unmanned aerial vehicles (UAV) at once. The point of the workshop is to build so-called anti-UAV tech - that is, developing AI tools to spot drones. The idea behind the competition is to understand "how to use computer vision algorithms to perceive UAVs is a crucial part of the whole UAV-defense system", the researchers write.

The anti-drone dataset: For the competition, competitors got access to a dataset containing "280 high-quality, full HD thermal infrared video sequences, spanning multiple occurrences of multi-scale UAVs." This footage contains "more challenging video sequences with dynamic backgrounds and small-scale targets" than those from prior competitions, they write. It also includes drones in different sizes, ranging from tiny consumer drones up to mid-range DJIs all the way up to the sorts of big drones used in industrial contexts.

Winners (and how they won): The paper includes an analysis of the three top teams, all of which come from Chinese universities. The top-ranking team from Beijing Institute of Technology used a spatio-temporal Siamese network-based tracker. Among the other two teams, both used the 'SuperDIMP' track technology, though one team used an ensemble of trackers and got them to vote on likely targets, while the other further refined SuperDIMP.
Read more:The 2nd Anti-UAV Workshop & Challenge: Methods and Results (arXiv).
Find out more informationat the official challenge website (ICCV 2021 site).

####################################################

Making GraphQL calls more efficient with machine learning:
...In the latest installment of everything-can-be-approximated: predicting the cost of fulfilling GraphQL calls...
IBM and academic researchers have built a machine learning model that can predict the query costs for a given GraphQL query, potentially making it easier for users of GraphQL to fulfill a larger proportion of user requests. GraphQL is a query language for APIs and a backend that makes it easy to funnel complex requests between users and sites; it was originally developed by Facebook. The approach uses features extracted via natural-language processing, graph neural nets, and symbolic features, and creates "a general ML workflow to estimate query cost that can be applied to any given GraphQL API".

Testing the approach: To test the approach, the authors collected 100,000 and 30,000 responses from, respectively, the GitHub and Yelp GraphQL APIs, then used a mostly automated software pipeline to explore 1,500 combinations of models and hyperparameters for both Yelp and GitHub. The result were some models that seemed like they made useful predictions relative to hand-written expert system baselines.
"We observe that, while the static analysis guarantees an upper bound, the price in terms of over-estimation can be significant, especially with larger query sizes. On the other hand, for both datasets, the ML estimates stay remarkably close to the actual response complexity even for the largest queries"
Mean absolute error: "For both datasets, the accuracy gain of the ML approach compared to the static analysis is striking both in terms of average value, and standard deviation," the authors write. "This further validates the observation that ML approach is accurate for large queries, which are challenging for the static analysis… the ML cost estimation policy is able to accept a bigger proportion of queries for both APIs."

Why this matters: Taken in itself, this is some software that makes it slightly lower cost to serve and fulfill GraphQL requests. But if we zoom out this is another example of just how powerful ML techniques are at approximating complex functions, and highlight how we're moving into a world driven by approximation engines rather than specific hand-written accounting systems.
Read more: Learning GraphQL Query Costs (Extended Version).

####################################################

Reminder: Microsoft created one of China's most popular chatbots:
...Before there was Tay, there was Xiaoice - and it's still going…
Here's a fun story about how millions of people in China (660 million people worldwide) are increasingly depending on a relationship with a virtual chatbot - Xiaoice, a chatbot originally built by Microsoft and subsequently spun out into a local startup. Xiaoice is a hybrid system, blending modern deep learning techniques with a lot of hand-written stuff (for a deepdive, check out Import AI #126).
Microsoft span Xiaoice off into its own entity in mid-2020 - a story that I think passed many people by in the West. Now, the startup that develops it is worth over $1 billion and is led by a former Microsoft manager.

Who speaks to the chatbots: Xiaoice's CEO says "the platform's peak user hours -- 11pm to 1am -- point to an aching need for companionship. "No matter what, having XiaoIce is always better than lying in bed staring at the ceiling," he said.".
Read more: 'Always there': the AI chatbot comforting China's lonely millions (France24).
More information about the spinout here:Tracing an independent future for Xiaoice, China’s most popular chatbot (KrASIA).

####################################################

Tech Tales:

Escape Run
[London, 2032]

We got into the van, put on our masks, changed our clothes for ones with weights sewn into the lining to change our gait, then drove to our next location. We got out, walked through a council block and used some keycards to exit through a resident-only park, then got into another vehicle. Changed our masks again. Changed our clothes again. One of us went and hid in a compartment in the truck. Then when we got to the next location we got out but left the person inside the track, so we'd confuse anything that was depending on there being a certain number of us. Then we went into a nearby housing block and partied for a few hours, then left in different directions with the other partygoers.
We all slept in different places in the city, having all changed outfits and movement gaits a few times.
That night, we all checked our phones to see if we'd had any luck finding our counterparts. But our phones were confused because the counterparts were also wearing masks, changing cars, swapping clothes, and so on.
We sleep and hope to have better luck tomorrow. We're sure we'll find eachother before the police find us.

Things that inspired this story: Adversarial examples; pedestrian re-identification; gait recognition.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Import AI

Many GPUs

Oakland, California 94609

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Import AI 264: Tracking UAVs; Stanford tries to train big models; deepfakes as the dog ate my homework

Older messages

Import AI 263: Foundation models; Amazon improves Alexa; My Little Pony GPT.

Import AI 262: Israeli GPT3; Korean GLUE; the industrialization of computer vision

Import AI 261: DeepMind makes a better Transformer; drones can see in the dark now; and a 6bn finetuned code model.

Import AI 260: BERT-generated headlines; pre-training comes to RL; $80 million for industrial robots

Import AI 259: Race + Medical Imagery; Baidu takes SuperGLUE crown; AlphaFold and the secrets of life

You Might Also Like

Re: Take incredible iPhone photos

So you want to break down monolith? Read that first.

📧 Get Pragmatic REST APIs for 30% OFF (limited offer)

SRE Weekly Issue #466

WP Weekly 232 - Energy - Faster Woo, Patterns in Folders, $800K Yearly

Last Chance to Register for ElasticON Singapore – Don’t Miss Out!

Spring Bean Scopes for Dependency Injection

Claude 3.7 Sonnet and GPT-4.5 - Sync #508

C#546 Finalizers are tricker than you think

PD#615 How Core Git Developers Configure Git