Import AI 235: Use GEM to test language models; the four eras of facial recognition; and how the US can measure its robot fleet

How big will the market for 'reality simulators' eventually become?

View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

20 million eye images - get 'em while the link still works!
...Go on, you're a little bit curious about what you could hack around with this...
Researchers with the University of Tubingen in Germany have published a dataset of 20 million eye images, gathered via seven different eye tracking formats. The data is diverse - eyes have been recorded while driving outside, driving in a simulator, and carrying out a variety of indoor and outdoor activities. The data includes 2D and 3D segmentation, annotated pupils, position and radius of the eyes, and more. The authors hope TEyeD will "contribute to the application of eye-movement and gaze estimation techniques in challenging practical use cases."
Read more: TEyeD: Over 20 million real-world eye images with Pupil, Eyelid, and Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector, and Eye Movement Types (arXiv).
Get the data here (via a Sharepoint link).

###################################################

Happy 2021 - Lacuna Fund is about to help create more African agriculture datasets:
...First round of grants shows what applied machine learning means…
Lacuna Fund, an organization that funds the creation of labeled datasets for underserved communities is supporting six projects focused on agricultural data. Lacuna also wants to support the creation of language datasets in sub-saharan Africa (Import AI: 216.

Six projects for better data: The projects involve datasets for georeferenced crop images, land use planning in Tanzania, crop pest and disease diagnosis, water use, cleaning up existing crop-cut yield datasets, and a five-country crop dataset means to be gathered via cameras mounted on custom-designed vehicles.
Read more about the awards here (Lacuna Fund website).
Via: AI Kenya newsletter (Mailchimp archive) .

###################################################

Here's how the USA could get a handle on AI policy:
...One weird trick to give the government the jump on the robots…
Measurement is a prerequisite to sensible policymaking - if you can't measure or quantify something, it's hard to regulate or manage it. Rob Seamans, a professor with NYU, wants to help the US measure the impact of AI on its economy and has written a piece in Brookings outlining how to do that.

The key? The US needs to measure how the addition of robots and/or AI-oriented software can influence productivity at firms or specific firm-owned places (e.g, a warehouse). The US does not do this today. It used to - in the 1980s and 1990s the US conducted the 'Survey of Manufacturing Technology', but retired that due to government cutbacks in the 1990s. Seamans' suggestion is a pretty simple one (which is why it might work): we should bring back the survey and do it annually.

What should we ask America about AI? "The survey would include questions about the use of specific technologies, such as robots, machine learning, cloud, e-commerce, autonomous guided vehicles, and others, and could be a simple “yes/no” question about whether the establishment has the technology or not," Seamans writes. "There would be multiple benefits to a standalone survey of technology. The survey would allow researchers to identify sectors and regions of the economy that are being impacted by new technologies."

Why do this at all? Data from France shows that if you add robots to a company, the company creates more jobs. We should do a better job of measuring data at the US level so we can do the same study here easily, Seamans said. "While there is excitement about the impact that new technologies like artificial intelligence and robotics will have on our economy, we need to do more to measure where and how these technologies are being used," he writes.
Read more: Robot census: Gathering data to improve policymaking on new technologies (Brookings).

###################################################

Language models are here, but how do we evaluate them? Try GEM:
...Multi-task benchmark aims to give us better signals about AI progress…
A gigantic team of researchers have collaborated to build GEM, a benchmark to help evaluate progress in natural language generation. NLG is going to be a big deal in the next few years as the success of models like GPT3 creates demand for better ways to evaluate synthetically-generated text. GEM represents a hard, multi-task generative benchmark which AI researchers can use to test out the capabilities of their model.

11 tests: The first version of GEM includes 11 test datasets and tasks that "measure specific generation challenges, such as content selection and planning, surface realization, paraphrasing, simplification, and others". The initial datasets are: CommonGEN, Czech Restaurant, DART, E2E clean, MLSum, Scheme-Guided Dialog, ToTTo, XSum, WebNLG, WikiAuto + Turk/ASSET, and WikiLingua.

Data cards: The GEM-creators are thinking about AI policy, as well, because they've included a 'data statement' for each of the 11 included tasks. A data statement works like the label on food - you list out the ingredients and some of the salient intended (and unintended) uses. Today, most AI systems are broadly undocumented, so it's notable that GEM prioritize data legibility for the first version of the benchmark.

Why this matters: Evaluating generative models is challenging because they have vast capability surfaces which are hard to characterize with today's tests. Systems like GEM will help us get (somewhat fuzzy) signals about the creative and generative capabilities of these models. The more - and better - tests we have, the easier it's going to be to craft sensible policies around the deployment of AI systems.
Read more: The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics (arXiv).
Find out more at the official website (GEM Benchmark website).

###################################################

What's the big deal about facial recognition? A historical analysis gives us some answers:
...Facial recognition has industrialized, so we should take it seriously…
Facial recognition is one of the most prominent uses of contemporary AI, for anything from unlocking your phone to helping you apply filters to your face in consumer apps to being a tool used by those involved in security to track and surveil individuals. But where did facial recognition come from and how significant is the moment we're in now? That's a question that two researchers try to answer with a review of how facial recognition evaluation has occurred over time.

The four periods of facial recognition: Facial recognition has four distinct eras which contribute to stages of technology development, as well as commercial interest. The authors do some really valuable work of providing some statistics to help us understand the different salient aspects of each era. These are:
- Period 1: Early research findings: 1964-1995: 5 datasets created, with an average number of ~2000 images per dataset.
- Period 2: Commercial viability: 1996-2006: 37 datasets created, with an average number of ~11,000 images each.
- Period 3: Mainstream development: 2007-2013: 33 datasets, with an average number of ~46,000 per dataset.
- Period 4: Deep learning breakthrough: 2014 onwards: 45 datasets, with an average number of ~2,600,000 images per dataset.

The most influential datasets: The authors also identify the most influential face datasets (according to citations), for each period. For the four periods, the popular datasets are: Picture of Facial Affect (P1), FERET (P2), Labeled Faces in the Wild (P3), and VGGFace (P4).

Why this matters: Recent advances in deep learning have made it generally cheaper to deploy more performant vision-based surveillance systems. At the same time, the data-intensiveness of the underlying computer vision algorithms has increased to the point it's very challenging to analyze and evaluate the datasets used to train these systems (you try and classify two million of anything and see how far you get). This also incentives people to move from curating precise datasets to indiscriminately scraping the cheapest (and arguably most diverse on some metrics) form of data - the internet.
In tandem with these changes in the technical infrastructure, so has the usage of facial recognition evolved - "we’ve seen the trend in facial recognition evaluation shift broadly from a highly controlled, constrained and well-scoped activity to one that is not," the authors write. "At minimum, an important intervention moving forward is to standardize documentation practice, of the model and the face datasets meant to be used in development or evaluation".
Read more: About Face: A Survey of Facial Recognition Evaluation (arXiv).

###################################################

Weights and Biases raises $45 million Series B:
...Measurement means money...
AI startup Weights and Biases has closed a $45m funding round, as investors bet that in the future more companies are going to invest in measuring and analyzing their machine learning infrastructure and models. W&B's software is for machine learning operations - think of this as the systems that AI practitioners use to help them train and develop models.

Why this matters: Funding for companies like W&B is a broader symptom of the industrialization of AI technology - we're seeing the emergence of pure 'B2B' businesses built not around specific AI components, but around facilitating AI infrastructure.
Read more: Weights and Biases Raises $45M Series B to Expand Beyond Experiment Tracking for Machine Learning Practitioners Everywhere (PRNewswire).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

More turmoil in AI ethics at Google:
In December, Google’s co-lead of Ethical AI, Timnit Gebru, was forced out in a dispute about academic freedom (see Import 226). Gebru had been pressured to withdraw a paper she had co-authored on the societal impacts of large language models. Axios reports that Google is now investigating Gebru’s co-lead, Margaret Mitchell, and has locked her email accounts, accusing her of downloading and sharing company files. Mitchell was reportedly collecting evidence of discriminatory treatment of Gebru. The newly formed Alphabet Workers Union calls the company’s actions “an attack on the people who are trying to make Google’s technology more ethical.”

###################################################

Tech Tales

The Glass Child
[Earth, 2050-35??]

The child stood there, embedded in glass, and people worshipped it and fought over it and tried to breach it (fruitlessly) and feared it and so on, for hundreds of years.

It was the child of a rich person who had foreseen the Time of the Scourge, and had paid to embed his kid into a multi-thousand year life preserving substrate, itself sheaved in an ultra-hard complex material that most would mistake for glass. The child seemed to float, suspended, in the center of a 10 foot tall translucent and impenetrable rectangle. The child was kept alive through obscure technologies, but appeared mostly dead to any observers. The 'mostly' part came from the color of his skin - he was grey, yes, but when lit by torchlight or electrics his skin would shine and seem to hint at an inner strength. Over hundreds of years, different groups of scavengers told individually varied stories about how they'd heard the child trapped in ice sing, or laugh, or shout.

People developed rituals around the child; mothers brought their sick children to the glass rectangle and they'd lay blankets down and leave their babies on it overnight. The superstition wasn't justified, but that didn't mean it was wrong - the same technologies that kept the boy alive took the form of a field and this field radiated out from the boy reaching the edge of the glass and slightly beyond. The effect was neither dramatic or obvious, but it worked just enough of the time that the rituals held. Over time, the child became an icon for health and was sainted and worshiped and, yes, fought over.

For a while, there was a king who was convinced if he stayed close to the child he, too, would live forever. He had a great castle built around the glass rectangle and had his throne placed against it. When you met with the king you'd go into a great room and the king would stare at you and, above and behind him, the pallid child would hang there in the glass. People convinced themselves that the child was watching them and that the king talked to it.

The kind did live a long time, aided by the mysterious field. And as most do, the king became more idiosyncratic the older he got, which ultimately led to him visiting great misery on the people within his dominion. They rebelled, as people tend to do, and tore down the castle in which the king lived. They heaped great firee around the glass rectangle and burned the materials of the palace. After a week, the fire went out, and the rectangle was unscathed.

So the people called the land cursed. Before they left, a group of them painted the rectangle with black paint, sealing in the child. Then they took their carts and their families and they left.

Things that inspired this story: Old hard drives; the relationship between memory and a sense of life; how people naturally coordinate around artefacts regardless of what the artefact is.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Copyright © 2021 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:

Import AI

Many GPUs

Oakland, California 94609

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Import AI 234: Pre-training with fractals; compute&countries; GANS for good

Monday, February 1, 2021

In what year will we see the emergence of the first religion entirely centered around AI? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this

Import AI 233: AI needs AI designers; estimating COVID risk with AI; the dreams of an old computer programmer.

Monday, January 25, 2021

In what year will we be able to record the 'dreams' of AI models in sufficient definition to make diagnoses from their content? View this email in your browser Welcome to Import AI, a

Import AI 232: Google trains a trillion parameter model; South Korean chatbot blows up; AI doesn't use as much electricity as you think

Monday, January 18, 2021

How might far-future historians describe the growth of the world's data centers? How strange, to have a people build large buildings with increasingly intricate vents and interfaces to the outside

Import AI 231: US army builds nightvision facial recognition; 800GB of text for training GPT-3 models; fighting COVID with a mask detector

Monday, January 11, 2021

Back in the 2010s, people were obsessed with graphene, making grand proclamations about how the material was about to upend the semiconductor industry. Graphene found many uses, but it hasn't

Import AI 235: Use GEM to test language models; the four eras of facial recognition; and how the US can measure its robot fleet

Older messages

Import AI 234: Pre-training with fractals; compute&countries; GANS for good

Import AI 233: AI needs AI designers; estimating COVID risk with AI; the dreams of an old computer programmer.

Import AI 232: Google trains a trillion parameter model; South Korean chatbot blows up; AI doesn't use as much electricity as you think

Import AI 231: US army builds nightvision facial recognition; 800GB of text for training GPT-3 models; fighting COVID with a mask detector

Import AI 229: Apple builds a Hypersim dataset; ways to attack ML; Google censors its research

You Might Also Like

Re: How to stop spam emails and calls

JSter #238 - Libraries and more

Master the New Elasticsearch Engineer v8.x Enhancements!

Daily Coding Problem: Problem #1707 [Medium]

Simplification Takes Courage & Perplexity introduces Comet

Mapped | Which Countries Are Perceived as the Most Corrupt? 🌎

The new tablet to beat

Import AI 402: Why NVIDIA beats AMD: vending machines vs superintelligence; harder BIG-Bench

GCP Newsletter #440

Apple Should Swap Out Siri with ChatGPT