Import AI 220: Google builds an AI border wall; better speech rec via pre-training; plus, a summary of ICLR papers

If you haven't met me in real life and are curious what I sound like, take a listen to this Skynet Today 'Let's Talk AI' podcast where I talk about one of my major obsessions - measuring and assessing the onward march of AI progress and impact.

View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Want to measure progress towards AGI? Welcome to a sissyphean task!
....Whenever we surpass an AGI-scale benchmark, we discover just how limited it really was...
One of the reasons it's so hard to develop general intelligence is whenever people come close to beating a benchmark oriented around measuring progress towards AGI, we discover just how limited this benchmark was and how far we have to go. That's the gist of a new blogpost from a "fervent generalist" from a person using the pseudonym 'Z', which discusses some of the problems inherent to measuring progress towards advanced AI systems.
"Tasks we've succeeded at addressing with computers seem mundane, mere advances in some other field, not true AI. We miss that it was work in AI that lead to them," they write. "Perhaps the benchmarks were always flawed, because we set them as measures of a general system, forgetting that the first systems to break through might be specialized to the task. You only see how "hackable" the test was after you see it "passed" by a system that clearly isn't "intelligent"."

So, what should we do? The author is fairly pessimistic about our ability to make progress here, because whenever people define new harder benchmarks, that usually incentivizes the AI community to collectively race to develop a system that can beat the benchmark. "Against such relentless optimization both individually and as a community, any decoupling between the new benchmark and AGI progress will manifest."

Why this matters: Metrics are one of the ways we can orient ourselves with regard to the scientific progress being made by AI systems - and posts like this remind us that any single set of metrics is likely to be flawed or overfit in some way. My intuition is the way to go is developing ever-larger suites of AI testing systems which we can then use to more holistically characterize the capabilities of any given system.
Read more: The difficulty of AI benchmarks (Singular Paths, blog).

###################################################

What's hard and what's easy about measuring AI? Check out what the experts say:
...Research paper lays out measurement and assessment challenges for AI policy…
Last year I helped organize a workshop at Stanford that brought together over a hundred AI practitioners and researchers to discuss the challenges of measuring and assessing AI. Our workshop identified six core challenges for measuring AI systems:
- Defining AI; as anyone knows, every policymaking exercise starts with definitions, and our definitions of AI are lacking.
- What are the factors that drive AI progress and how can we disambiguate them?
- How do we use bibliometric data to improve our analysis?
- What tools are available to help us analyze the economic impact of AI?
- How can we measure the societal impact of AI?
- What methods can we use to better anticipate the risks and threats of deployed AI systems?

Podcast conversation: Myself and Ray Perrault, co-chairs of the AI Index - a Stanford initiative to measure and assess AI, which hosted the workshop - recently appeared on the 'Let's Talk AI' podcast to discuss the paper with Sharon Zhou.

Why this matters: Before we can regulate AI, we need to be able to measure and assess it at various levels of abstraction. Figuring out better tools to use to measure AI systems will help technologists create information that can drive policy decisions. More broadly, by building 'measurement infrastructure' within governments, we can improve the ability for civil society to anticipate and oversee challenges brought on by the maturation of AI technology.
Read more: Measurement in AI Policy: Opportunities and Challenges (arXiv).
Listen to the podcast here: Measurement in AI Policy: Opportunities and Challenges (Let's Talk AI, Podbean).

###################################################
Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Copyright © 2020 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:

Import AI

Many GPUs

Oakland, California 94609

Add us to your address book

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Import AI 219: Climate change and function approximation; Access Now leaves PAI; LSTMs are smarter than they seem

Monday, October 19, 2020

If the deployment of AI systems starts to change cultures, how might we expect AI systems to be re-engineered to account for expected cultural changes? View this email in your browser Welcome to Import

Import AI 218: Testing bias with CrowS; how Africans are building a domestic NLP community; COVID becomes a surveillance excuse

Monday, October 12, 2020

If last year was about scaling things up and this year is about developing multi-modal networks (eg, ones that learn text and image representations in tandem, like this demo from the Allen Institute

Import AI 220: Google builds an AI border wall; better speech rec via pre-training; plus, a summary of ICLR papers

Older messages

Import AI 219: Climate change and function approximation; Access Now leaves PAI; LSTMs are smarter than they seem

Import AI 218: Testing bias with CrowS; how Africans are building a domestic NLP community; COVID becomes a surveillance excuse

Import AI 217: Deepfaked congressmen and deepfaked kids; steering GPT3 with GeDi; Amazon's robots versus its humans

Import AI 216: Google learns a learning optimizer; resources for African NLP; US and UK deepen AI coordination

Import AI 214 (fixed!): NVIDIA+ARM; a 57-subject NLP test; and AI for plant disease identification

You Might Also Like

Ranked | The World's Top 20 Economies by GDP Growth (2015-2025) 📊

Open Source Isnt Dead...Its Just Forked

LW 172 - How to Make Compare at Pricing Show at Checkout

Issue 165

This top multitool is under $30

Post from Syncfusion Blogs on 03/04/2025

⚙️ GenAI Siri

Big Notion Updates + Want to Earn Money?

The Sequence Knowledge #502: If You are Doing RAG You Need to Know Hypothetical Document Embeddings

Google's March 2025 Android Security Update Fixes Two Actively Exploited Vulnerabilities