Import AI 220: Google builds an AI border wall; better speech rec via pre-training; plus, a summary of ICLR papers

If you haven't met me in real life and are curious what I sound like, take a listen to this Skynet Today 'Let's Talk AI' podcast where I talk about one of my major obsessions - measuring and assessing the onward march of AI progress and impact. 
View this email in your browser

Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your chums an AI upgrade. Subscribe here.

Want to measure progress towards AGI? Welcome to a sissyphean task!
....Whenever we surpass an AGI-scale benchmark, we discover just how limited it really was...
One of the reasons it's so hard to develop general intelligence is whenever people come close to beating a benchmark oriented around measuring progress towards AGI, we discover just how limited this benchmark was and how far we have to go. That's the gist of a new blogpost from a "fervent generalist" from a person using the pseudonym 'Z', which discusses some of the problems inherent to measuring progress towards advanced AI systems.
  "Tasks we've succeeded at addressing with computers seem mundane, mere advances in some other field, not true AI. We miss that it was work in AI that lead to them," they write. "Perhaps the benchmarks were always flawed, because we set them as measures of a general system, forgetting that the first systems to break through might be specialized to the task. You only see how "hackable" the test was after you see it "passed" by a system that clearly isn't "intelligent"."

So, what should we do? The author is fairly pessimistic about our ability to make progress here, because whenever people define new harder benchmarks, that usually incentivizes the AI community to collectively race to develop a system that can beat the benchmark. "Against such relentless optimization both individually and as a community, any decoupling between the new benchmark and AGI progress will manifest."

Why this matters: Metrics are one of the ways we can orient ourselves with regard to the scientific progress being made by AI systems - and posts like this remind us that any single set of metrics is likely to be flawed or overfit in some way. My intuition is the way to go is developing ever-larger suites of AI testing systems which we can then use to more holistically characterize the capabilities of any given system.
  Read more: The difficulty of AI benchmarks (Singular Paths, blog).

###################################################

What's hard and what's easy about measuring AI? Check out what the experts say:
...Research paper lays out measurement and assessment challenges for AI policy…
Last year I helped organize a workshop at Stanford that brought together over a hundred AI practitioners and researchers to discuss the challenges of measuring and assessing AI. Our workshop identified six core challenges for measuring AI systems:
- Defining AI; as anyone knows, every policymaking exercise starts with definitions, and our definitions of AI are lacking.
- What are the factors that drive AI progress and how can we disambiguate them?
- How do we use bibliometric data to improve our analysis?
- What tools are available to help us analyze the economic impact of AI?
- How can we measure the societal impact of AI?
- What methods can we use to better anticipate the risks and threats of deployed AI systems?

Podcast conversation: Myself and Ray Perrault, co-chairs of the AI Index - a Stanford initiative to measure and assess AI, which hosted the workshop - recently appeared on the 'Let's Talk AI' podcast to discuss the paper with Sharon Zhou.

Why this matters: Before we can regulate AI, we need to be able to measure and assess it at various levels of abstraction. Figuring out better tools to use to measure AI systems will help technologists create information that can drive policy decisions. More broadly, by building 'measurement infrastructure' within governments, we can improve the ability for civil society to anticipate and oversee challenges brought on by the maturation of AI technology.
  Read more: Measurement in AI Policy: Opportunities and Challenges (arXiv).
    Listen to the podcast here: Measurement in AI Policy: Opportunities and Challenges (Let's Talk AI, Podbean).

###################################################
Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Twitter
Facebook
Website
Copyright © 2020 Import AI, All rights reserved.
You are receiving this email because you signed up for it. Welcome!

Our mailing address is:
Import AI
Many GPUs
Oakland, California 94609

Add us to your address book


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by Mailchimp

Older messages

Import AI 219: Climate change and function approximation; Access Now leaves PAI; LSTMs are smarter than they seem

Monday, October 19, 2020

If the deployment of AI systems starts to change cultures, how might we expect AI systems to be re-engineered to account for expected cultural changes? View this email in your browser Welcome to Import

Import AI 218: Testing bias with CrowS; how Africans are building a domestic NLP community; COVID becomes a surveillance excuse

Monday, October 12, 2020

If last year was about scaling things up and this year is about developing multi-modal networks (eg, ones that learn text and image representations in tandem, like this demo from the Allen Institute

Import AI 217: Deepfaked congressmen and deepfaked kids; steering GPT3 with GeDi; Amazon's robots versus its humans

Monday, October 5, 2020

What will be the AI experiment equivalent of the Large Hadron Collider? View this email in your browser Welcome to Import AI, a newsletter about artificial intelligence. Forward this email to give your

Import AI 216: Google learns a learning optimizer; resources for African NLP; US and UK deepen AI coordination

Monday, September 28, 2020

'Come out, come out, wherever you are!' - Alexa, playing a 'game' with some human children, by playing hide and seek via Amazon's new Ring indoor security drone. 2022. View this

Import AI 214 (fixed!): NVIDIA+ARM; a 57-subject NLP test; and AI for plant disease identification

Monday, September 14, 2020

Plus: Anduril's new drone; the computational power of the brain Apologies for the half-formed newsletter sent earlier today! View this email in your browser Welcome to Import AI, a newsletter about

You Might Also Like

Ranked | The World's Top 20 Economies by GDP Growth (2015-2025) 📊

Tuesday, March 4, 2025

Halfway through the 2020s, here's a report card on the top 20 economies and their progress since 2015. View Online | Subscribe | Download Our App Presented by Hinrich Foundation NEW REPORT:

Open Source Isnt Dead...Its Just Forked

Tuesday, March 4, 2025

Top Tech Content sent at Noon! Augment Code: Developer AI for real eng work. Start for free Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, March 4,

LW 172 - How to Make Compare at Pricing Show at Checkout    

Tuesday, March 4, 2025

How to Make Compare at Pricing Show at Checkout ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ Shopify Development news and articles Issue 172 -

Issue 165

Tuesday, March 4, 2025

💻🖱️ A single click destroyed this man's entire life. Fake murders get millions of YouTube views. Zuckerberg can now read your silent thoughts. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

This top multitool is under $30

Tuesday, March 4, 2025

Thinnest phone ever?📱; ArcoPlasma; Siri alternatives 🗣️ -- ZDNET ZDNET Tech Today - US March 4, 2025 GOTRAX 4 electric scooter I finally found a high-quality multitool for under $30 Compact and durable

Post from Syncfusion Blogs on 03/04/2025

Tuesday, March 4, 2025

New blogs from Syncfusion ® Stacked vs. Grouped Bar Charts in Blazor: Which is Better for Data Visualization? By Gowrimathi S Learn the difference between the stacked and grouped bar charts and choose

⚙️ GenAI Siri

Tuesday, March 4, 2025

Plus: TSMC's hundred billion dollar investment ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Big Notion Updates + Want to Earn Money?

Tuesday, March 4, 2025

Notion Tabs, Build with AI, Hidden Updates + New Opportunity just for you 🔥 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

The Sequence Knowledge #502: If You are Doing RAG You Need to Know Hypothetical Document Embeddings

Tuesday, March 4, 2025

One of the most important methods to enable sematically-rich RAG. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Google's March 2025 Android Security Update Fixes Two Actively Exploited Vulnerabilities

Tuesday, March 4, 2025

THN Daily Updates Newsletter cover Starting with DevSecOps Cheatsheet A Quick Reference to the Essentials of DevSecOps Download Now Sponsored LATEST NEWS Mar 4, 2025 How New AI Agents Will Transform