Import AI 227: MAAD-Face; GPT2 and Human Brains; Facebook detects Hateful Memes

How might an AI system conceptualize time travel? Would they perhaps see the ability to slow time as being (somewhat) comparable to acquiring more compute power and developing more efficient algorithms so they can increase their inference frequency?

View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

University of Texas ditches algorithm over bias concerns:
....Gives an F to the GRADE software…
The University of Texas at Austin has stopped using software, called GRADE, to screen for those applying for a PHD at its CS department. UT Austin used GRADE between 2013 and 2019, and stopped using it in early 2020, according to reporting from The Register. Some of the developers of GRADE thinks it doesn't have major issues with regard to manifesting bias along racial or gender lines, but others say it could magnify existing biases present in the decisions made by committees of humans.

Why this matters: As AI has matured rapidly, it has started being integrated into all facets of life. But some parts of life probably don't need AI in them - especially those that involve making screening determinations about people in ways that could have an existential impact on them, like admission to possible graduate programs.
Read more: Uni revealed it killed off its PhD-applicant screening AI just as its inventors gave a lecture about the tech (The Register).

###################################################

Element AI sells to ServiceNow:
...The great Canadian AI hope gets sold for parts…
American software company ServiceNow has acquired Element AI; the purchase looks like an acquihire, with ServiceNow executives stressing the value of Element AI's talent, rather than any particular product the company had developed.

Why this is a big deal for Canada: Element AI was formed in 2016 and designed as a counterpoint to the talent-vacuums of Google, Facebook, Microsoft, and so on. It was founded with the ambition it could become a major worldwide player, and a talent magnet for Canada. It even signed on Yoshua Bengio, one of the Turing Award winners responsible for the rise of deep learning, as an advisor. Element AI raised around $250+ million in its lifespan. Now it has been sold, allegedly for less than $400 million, according to the Globe and Mail. Shortly after the deal closed, ServiceNow started laying off of a variety of Element AI staff, including its public policy team.

Why this matters: As last week's Timnit Gebru situation highlights, AI research is at present concentrated in a small number of private sector firms, which makes it inherently harder to do research into different forms of governance, regulation, and oversight. During its lifetime, Element AI did some interesting work on data repositories, and I'd run into Element AI people at various government events where they'd be encouraging nations to build shared data repositories for public goods - a useful idea. Element AI being sold to a US firm increases this amount of concentration and also reduces the diversity of experiments being run in the space of 'potential AI organizations' and potential AI policy. I wish everyone at Element AI luck and hope Canada takes another swing at trying to form a counterpoint to the major powers of the day.
Read more: Element AI acquisition brings better, smarter AI capabilities for customers (ServiceNow).

###################################################

Uh oh, a new gigantic face dataset has appeared:
...123 million labels for 3 million+ photographs...
German researchers have developed MAAD-Face, a dataset containing more than a hundred million labels applied to millions of images of 9,000 people. MAAD-Face was built by researchers at the Fraunhofer Institute for Computer Graphics and is designed to substitute for other, labeled datasets like CelebA and LFW. It also, like any dataset involving a ton of labeled data about people introduces a range of ethical questions.

But the underlying dataset might be offline? MAAD-Face is based on VGG, a massive facial recognition dataset. VGG is currently offline for unclear reasons, potentially due to controversies associated with the dataset. I think we'll see more examples of this - in the future, perhaps some % of datasets like this will be traded surreptitiously via torrent networks. (Today, datasets like DukeMTMC and ImageNet-ILSVRC-2012 are circulating via torrents, having been pulled off of public repositories following criticism relating to biases or other issues with their datasets.)

What's in a label? MAAD-Face has 47 distinct labels which can get applied to images, with labels ranging from non-controversial subjects (are they wearing glasses? Is their forehead visible? Can you see their teeth?) to ones that have significant subjectivity (whether the person is 'attractive', 'chubby', 'middle aged'), to ones where it's dubious whether we should be assigning the label at all (e.g, ones that assign a gender of male or female, or which classifies people into races like 'asian', 'white', or 'black').

Why this matters - labels define culture: As more of the world becomes classified and analyzed by software systems, the labels we use to build the machines that do this classification matter more and more. Datasets like MAAD-Face both gesture at the broad range of labels we're currently assigning to things, and also should prepare us for a world where someone uses computer vision systems to do something with an understanding of 'chubby', or other similarly subjective labels. I doubt the results will be easy to anticipate.
Read more: MAAD-Face: A Massively Annotated Attribute Dataset for Face Images (arXiv).
Get the dataset from here (GitHub).
Via Adam Harvey (Twitter), who works on projects tracking computer vision like 'MegaPixels' (official site).

###################################################

Is GPT2 like the human brain? In one way - yes!
...Neuroscience paper finds surprising overlaps between how humans approach language and how GPT2 does…
Are contemporary language models smart? That's a controversial question. Are they doing something like the human brain? That's an even more controversial question. But a new paper involving gloopy experiments with real human brains suggests the answer could be 'yes' at least when it comes to how we predict words in sentences and use our memory to improve our predictions.

But, before the fun stuff, a warning: Picture yourself in a dark room with a giant neon sign in front of you. The sign says CORRELATION != CAUSATION. Keep this image in mind while reading this section. The research is extremely interesting, but also the sort of thing prone to wild misinterpretation, so Remember The Neon Sign while reading. Now…

What they investigated: "Modern deep language models incorporate two key principles: they learn in a self-supervised way by automatically generating next-word predictions, and they build their representations of meaning based on a large trailing window of context," the researchers write. "We explore the hypothesis that human language in natural settings also abides by these fundamental principles of prediction and context".

What they found: For their experiments, they used three types of word features (arbitrary, GloVe, and GPT2) and compared how well these features could predict neural activity in people compared to what happened when given different sentences where they needed to predict the next word, and they tried to see which of these features could do the most effective predictions. Their findings are quite striking - GPT2 models assign very similar probabilities for the next words in a sentence to humans, and as you increase the context window (the number of words the person or algo sees before it makes a prediction), performance improves further, and human and algorithmic answers continue to be in agreement.

Something very interesting about the brain: "On the neural level, by carefully analyzing the temporally resolved ECoG responses to each word as subjects freely listened to an uninterrupted spoken story, our results suggest that the brain has the spontaneous propensity (without explicit task demands) to predict the identity of upcoming words before they are perceived", they write. And their experiments show that the human brain and GPT2 seem to behave similarly here.

Does this matter? Somewhat, yes. As we develop more advanced AI models, I expect they'll shed light on how the brain does (or doesn't) work. As the authors note here, we don't know the mechanism via which the brain works (though we suspect it's likely different to some of the massively parallel processing that GPT2 does), but it is interesting to observe similar behavior in both the human brain and GPT2 when confronted with the same events - they're both displaying similar traits I might term cognitive symptoms (which doesn't necessarily imply underlying cognition). "Our results support a paradigm shift in the way we model language in the brain. Instead of relying on linguistic rules, GPT2 learns from surface-level linguistic behavior to generate infinite new sentences with surprising competence," writes the Hasson Lab in a tweet.
Read more: Thinking ahead: prediction in context as a keystone of language in humans and machines (bioRxiv).
Check out this Twitter thread from the Hasson Lab about this (Twitter).###################################################

University of Texas ditches algorithm over bias concerns:
....Gives an F to the GRADE software…
The University of Texas at Austin has stopped using software, called GRADE, to screen people applying for PHDs at its CS department. UT Austin used GRADE between 2013 and 2019, according to reporting from The Register. (Some of the developers of GRADE think it doesn't have major issues with regard to manifesting harmful biases in evaluating applications, but others say it could magnify existing biases present in the decisions made by committees of humans.)

Why this matters: As AI has matured rapidly, it has started being integrated into all facets of life. But some parts of life probably don't need AI in them - especially those that involve making screening determinations about people in ways that could have an existential impact on them, like admission to possible graduate programs.
Read more:Uni revealed it killed off its PhD-applicant screening AI just as its inventors gave a lecture about the tech (The Register).

###################################################

Facebook helps AI researchers detect hateful memes:
...Is that an offensive meme? This AI system thinks so…
The results are in from Facebook's first 'Hateful Memes Challenge' (Import AI: 198), and it turns out AI systems are better than we thought they'd be at labeling offensive versus inoffensive memes. Facebook launched the competition earlier this year; 3300 participants entered, and the top scoring team has an error rate of 0.845 AUCROC - that compares favorably to an AUCROC of 0.714 for the top-performing baseline system that Facebook developed at the start of the competition.

What techniques they used: "The top five submissions employed a variety of different methods including: 1) ensembles of state-of-the-art vision and language models such as VILLA, UNITER, ERNIE-ViL, VL-BERT, and others; 2) rule-based add-ons, and 3) external knowledge, including labels derived from public object detection pipelines," Facebook writes in a blog post about the challenge.

Why this matters: Competitions are one way to generate signal about the maturity of a tech in a given domain. The Hateful Memes Challenge is a nice example of how a well posed question and associated competition can lead to a meaningful improvement in capabilities - see the 10+ absolue improvement in AUCROC scores for this competition. In the future, I hope a broader set of organizations host and run a bunch more competitions.
Read more: Hateful Memes Challenge winners (Facebook Research blog).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

$50,000 AI forecasting tournament:
Metaculus, an AI forecasting community and website, has announced an AI forecasting tournament, starting this month and running until February 2023. There will be questions on progress on ~30 AI benchmarks, over 6-month; 12-month; and 24-month time horizons. The tournament has a prize pool of $50,000, which will be paid out to the top forecasters. The tournament is being hosted in collaboration with the Open Philanthropy Project.

Existing forecasts: The tournament questions have yet to be announced, so I'll share some other forecasts from Metaculus (see also Import 212). Metaculus users currently estimate: 70% that if queried, the first AGI system claims to be conscious; 25% that photonic tensors will be widely available for training ML models; 88% that an ML model with 100 trillion parameters will be trained by 2026; 45% that GPT language models generate less than $1bn revenues by 2025; 25% that if tested, GPT-3 demonstrates text-based intelligence parity with human 4th graders.

Matthew’s view: As regular readers will know, I’m very bullish on the value of AI forecasting. I see foresight as a key ingredient in ensuring that AI progress goes well. While the competition is running, it should provide good object-level judgments about near-term AI progress. As the results are scored, it might yield useful insights about what differentiates the best forecasts/forecasters. I’m excited about the tournament, and will be participating myself.
Pre-register for the tournament here.

###################################################

Tech Tales:

The Narrative Control Department
[A beautiful house in South West London, 2030]

"General, we're seeing an uptick in memes that contradict our official messaging around Rule 470." "What do you suggest we do?"
"Start a conflict. At least three sides. Make sure no one side wins."
"At once, General."

And with that, the machines span up - literally. They turned on new computers and their fans revved up. People with tattoos of skeletons at keyboards high-fived eachother. The servers warmed up and started to churn out their fake text messages and synthetic memes, to be handed off to the 'insertion team' who would pass the data into a few thousand sock puppet accounts, which would start the fight.

Hours later, the General asked for a report.
"We've detected a meaningful rise in inter-faction conflict and we've successfully moved the discussion from Rule 470 to a parallel argument about the larger rulemaking process."
"Excellent. And what about our rivals?"
"We've detected a few Russian and Chinese account networks, but they're staying quiet for now. If they're mentioning anything at all, it's in line with our narrative. They're saving the IDs for another day, I think."

That night, the General got home around 8pm, and at the dinner table his teenage girls talked about their day.
"Do you know how these laws get made?" the older teenager said. "It's crazy. I was reading about it online after the 470 blowup. I just don't know if I trust it."
"Trust the laws that gave Dad his job? I don't think so!" said the other teenager.
They laughed, as did the General's wife. The General stared at the peas on his plate and stuck his fork into the middle of them, scattering so many little green spheres around his plate.

Things that inspired this story: State-backed information campaigns; collateral damage and what that looks like in the 'posting wars'; AI-driven content production for text, images, videos; warfare and its inevitability; teenagers and their inevitability; the fact that EVERYONE goes to some kind of home at some point in their day or week and these homes are always different to how you'd expect.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Import AI

Many GPUs

Oakland, California 94609

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Import AI 226: AlphaFold; a Chinese GPT2; Google fires Timnit Gebru

Monday, December 7, 2020

What would it look like to be able to train ML models via a touch-based UI on a mobile phone? And how realistic would it be to let users finetune their own models from generic pre-trained ones? View

Import AI 225: Tencent climbs the compute curve; NVIDIA invents a hard AI benchmark; a story about Pyramids and Computers

Friday, December 4, 2020

How will COVID influence demand for computers? That depends on the extent to which some of the digitization the crisis has caused remains - and history suggests it will. How might this drive further

Import AI 227: MAAD-Face; GPT2 and Human Brains; Facebook detects Hateful Memes

Older messages

Import AI 226: AlphaFold; a Chinese GPT2; Google fires Timnit Gebru

Import AI 225: Tencent climbs the compute curve; NVIDIA invents a hard AI benchmark; a story about Pyramids and Computers

Import AI 224: AI cracks the exaflop barrier; robots and COVID surveillance; gender bias in computer vision

Import AI 223: Why AI systems break; how robots influence employment; and tools to 'detoxify' language models

Import AI 221: How to poison GPT3; an Exaflop of compute for COVID; plus, analyzing campaign finance with DeepForm

You Might Also Like

Open Source Isnt Dead...Its Just Forked

LW 172 - How to Make Compare at Pricing Show at Checkout

Issue 165

This top multitool is under $30

Post from Syncfusion Blogs on 03/04/2025

⚙️ GenAI Siri

Big Notion Updates + Want to Earn Money?

The Sequence Knowledge #502: If You are Doing RAG You Need to Know Hypothetical Document Embeddings

Google's March 2025 Android Security Update Fixes Two Actively Exploited Vulnerabilities

🍏 How Siri Is Ruining My Smart Home — 7 Improvements PlayStation Plus Needs to Make