March 11, 2025

🌁#91: AI Literacy (full email)

As AI models graduate to more sophisticated reasoning, humans – especially policymakers – remain stuck in the basics

This Week in Turing Post:

Wednesday, AI 101, Concept: we explore what if blend LightThinker and Multi-Head Latent Attention (MLA)
Friday, Interviews:
❗️today and tomorrow I’m interviewing CEOs of ElevenLabs, Lamini, and Pinecone – let me know if you have any questions for them ❗️

How much do we know about AI

Well, I would assume that my readers know some. You are builders, you are engineers, you are vivid learners. But there are literally billions of other people who are conflicted about what AI is and how deeply machine learning is integrated into our lives.

This week, I’m moderating a few sessions at the HumanX conference, and yesterday, at the opening, that was the main cry from AI leaders I talked to and from the stage, which featured a few notable politicians – the lack of knowledge. How often do I give the stage to a politician here? Never before, but here are a few quotes that made me hopeful (though I’m still very skeptical that the government can properly regulate AI, being mostly so unknowledgable about what AI is).

Meet Congressman Jay Obernolte, Chairman of the Task Force on AI at the U.S. House of Representatives:

"Between the two of us in the Task Force [himself and Congressman Ted Lieu], we are fully half of the computer scientists in Congress. So please let me implore the audience here. We are underrepresented, right? We need your guidance. Please send us more computer scientists.”
“We need to push back on the misperception that AI is currently unregulated in the United States. That is absolutely wrong.”
"The risks of AI are highly contextual, so it matters very much what you are doing with AI when you evaluate what the risks of that deployment are."
"We took 24 diverse members who most of whom knew very little about AI coming in. I want people from all different policy committees so that when we're done, they can go. Not only do they bring us their perspective, but they go back to their committees and evangelize the work that we're doing."
"AI is unlike a lot of topics that we legislate on. It's been informed by, I would say, misinformed by 50 years of science fiction and pop culture. And if you ask the average American what AI is and what it isn't and what the chief hazards are, you'll get something out of a Terminator movie where an evil army of robots rises up to take over the world..."

Because that was exactly the motivation behind starting TheSequence and Turing Post: to bust that persistent Terminator myth and build knowledge about AI and ML.

When we say AI, does it mean computer vision? Does it mean data labeling? Or is it robotics? The thing is – and that’s exactly where it gets tricky – it’s all of these things. And now, in the age of GenAI, we are forced to combine so many technologies to stay ahead of the curve. You can’t just stick to your cozy data labeling anymore – you need to upgrade to synthetic data. You can’t just fine-tune models – you need to leverage retrieval-augmented generation (RAG).You can’t just focus on single-modal AI – you need to work with multimodal architectures that integrate text, image, and audio. You can’t just build classifiers – you need to create AI systems that understand context and nuance, etc etc.

And the politicians working on AI bills? They know so little. Yesterday, when Congressman Jay Obernolte used the word distillation, the room applauded: “He knows knowledge distillation!” But he is a computer scientist, after all.

It’s really a shame that we still lack so much knowledge about machine learning and AI. As I always try to demonstrate through Turing Post, ML has a rich history spanning more than a hundred years. And now, all the important stakeholders – government officials who regulate, teachers who educate our kids, doctors who diagnose and treat us and many many others – need to understand what they are working with.

You know why? Because there’s no sign of slowing down. They are going to work with AI. Here’s an interesting observation from my colleague Alyona:

“The emergence of Chain of Draft gave me a new insight: models with higher intelligence (probably) won’t need to rely on detailed, step-by-step reasoning, explaining every step. Instead, short but meaningful steps will be enough to find the right answer. Let’s look at this from a human perspective to see the parallels. Chain-of-Thought is similar to how humans learn and explain their thinking during childhood and school. When people learn something new, they need detailed reasoning to check themselves and explore all aspects. But when they take exams, they don’t have time for such lengthy thinking – they must demonstrate their knowledge using only the most important points. Similarly, when solving tasks they’ve encountered many times before, people naturally skip over unnecessary details to save time. This is also an indicator of professionalism (just imagine how much time we would waste if everyone explained every small step at work).

Chain-of-Thought illustrates how models process knowledge in detail, requiring them to go over the same tasks repeatedly with full explanations. In contrast, Chain of Draft represents the next step in intelligence – where models are effectively ‘taking an exam,’ demonstrating their knowledge concisely. Chain of Draft is more user-oriented, it’s more “mature”, while Chain-of-Thought remains a crucial technique for developers to assess models’ capabilities, much like teachers evaluating students in school.”

I don’t believe humans are at the Chain of Draft stage with AI. We lack tremendously in AI literacy. So if even the models are passing an exam and graduating to a different level, we humans most certainly have to do the same. And that’s not even touching on the topic of educating our kids about AI. (I want to start working on a course for AI for kids – let me know if you want to collaborate on this topic.)

What can we do? Please educate those around you. Share resources like Turing Post, Interconnects, AI Made Simple, Latent Space, and blogs on Hugging Face with those who need this knowledge. It’s no longer just a nice thing to do – obtaining knowledge about our own creation is of utmost importance.

Follow us on Hugging Face 🤗

Curated Collections

We are reading/watching:

A NewsGuard audit exposes “Pravda,” (which means Truth) a Moscow-based disinformation network, for flooding AI training data with pro-Kremlin falsehoods—3.6 million articles in 2024 alone. Leading AI chatbots echoed these narratives 33% of the time, distorting global AI-generated news. American fugitive-turned-propagandist John Mark Dougan even boasted that Russian narratives could “change worldwide AI,” validating concerns over AI’s susceptibility to manipulation – another indication to push for AI literacy.
The future of Quantum and AI – an interview with Satya Nadella

Recommendation from an AI practitioner

Give a try to these two video-generation models: Character-3 from Hedra Studio – an omnimodal AI, integrating text, image, and audio to simplify content creation. I played with it for 5 min (that’s my way to say who far the models get):

Experimenting with Hedra

Luma made it “Dream Machine” available this week also

Experimenting with Luma models

I played with both – given enough time and precision, you really can create a high-quality video. However, making a real video might still be faster. Prompts are in the description to each video.

News from The Usual Suspects ©

Perplexity – Expanding Beyond the Web

Perplexity wants to break out of the browser, with growing signs of partnerships with hardware companies to integrate its AI into everyday devices. Deutsche Telekom’s AI Phone, featuring Perplexity’s assistant, debuts this year, blending AI seamlessly into voice interactions. Phones for now, then TVs? Where next?

Manus – China’s AI Challenger Goes Global

China’s AI ambitions get a new face with Manus, a high-performing AI agent (Built on Anthropic Claude Sonnet) from Monica.ai, reportedly outpacing OpenAI and Anthropic on key benchmarks. Founded by Xiao Hong, Manus started as a browser plugin and is now a $100M startup targeting international markets – strategically dodging China’s AI regulations. Unlike AGI purists, Xiao is focused on business, leveraging user data for monetization. Exclusive and invite-only, Manus could redefine China’s AI playbook abroad.

Apple – AI Delays, But Silicon Strength

Apple faced setbacks and successes in AI this week. Siri’s much-touted AI enhancements, promised for 2024, have been delayed. Security risks, particularly prompt injection vulnerabilities (as noted by Simon Willison), may be a factor. But on the hardware side, Apple flexes its muscle with the M3 Ultra chip, cementing its status as the leader in AI silicon. Ben Thompson argues Apple should open its AI models to developers, shifting from an aggregator to a true AI platform, leveraging its hardware to create new ecosystems.

Cortical Labs – AI Meets Biology

A glimpse of the future: Cortical Labs’ CL1 computer fuses human brain cells with silicon (!) to create an adaptive, low-energy AI system. Kept alive with pumps and temperature controls, the system has already taught itself to play Pong. With implications for AI, robotics, and neuroscience, it raises big ethical questions about machine consciousness. At $35,000 per unit, it ships in June 2025 – ushering in an era of living computers.

Mistral OCR – AI That Reads Like a Human

Mistral unveils a cutting-edge document understanding API, excelling in text, table, and equation extraction at scale. A new benchmark in AI-powered OCR.

Reinforcement Learning – A Win for AI’s Founding Fathers

Andrew Barto and Richard Sutton, pioneers of reinforcement learning, take home the 2024 Turing Award. Their work from the 1980s underpins everything from AlphaGo to modern AI assistants, fulfilling Turing’s vision of machines that learn from experience. From robotics to targeted ads, their impact is everywhere. A well-earned recognition.

Models to pay attention to:

Differentiable Logic Cellular Automata (Google Research) – Integrates Neural Cellular Automata with Differentiable Logic Gate Networks to enable self-healing, pattern generation, and robust computational architectures →read their blog
Phi-4-Mini Technical Report (Microsoft) – Introduces a 3.8B parameter multimodal model using Mixture-of-LoRAs, excelling in math, coding, and reasoning while maintaining efficiency →read the paper
Babel: Open Multilingual Large Language Models (Alibaba) – Develops an open-source LLM serving 90% of global speakers across 25 languages, excelling in underrepresented linguistic benchmarks →read the paper
Aya Vision: Expanding the Worlds AI Can See (Cohere) – Introduces an open-weight vision model outperforming larger competitors in multilingual and multimodal benchmarks →read the paper
LLMVoX: Autoregressive Streaming Text-to-Speech Model – Proposes a lightweight, LLM-agnostic TTS system with low latency, high accuracy, and seamless integration into multimodal AI →read the paper
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation (Moonshot AI) – Proposes LanDiff, a hybrid text-to-video model combining LLMs and diffusion techniques, surpassing existing models like Hunyuan Video and Sora →read the paper

The freshest research papers, categorized for your convenience

There were quite a few TOP research papers this week, we will mark them with 🌟 in each section.

Scaling and Optimization of Large Models

Dedicated Feedback and Edit Models Empower Inference-Time Scaling – Improves LLM inference by layering critique and refinement steps, enabling superior performance and zero-shot distillation benefits.
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation – Speeds up ultra-long text generation (100K tokens) by optimizing KV cache updates, cutting processing time from hours to minutes.
HybridNorm: Towards Stable and Efficient Transformer Training – Enhances transformer training stability by combining normalization strategies, improving loss reduction and benchmark performance.
Liger: Linearizing Large Language Models to Gated Recurrent Structures – Converts LLMs into efficient recurrent structures, preserving accuracy while reducing inference costs.

Model Architectures and Efficiency Improvements

Union of Experts: Adapting Hierarchical Routing to Decomposed Transformers – Reduces computation by 76% using a hierarchical mixture-of-experts framework with selective multi-head attention.
Visual-RFT: Visual Reinforcement Fine-Tuning – Improves Large Vision-Language Models with reward-driven fine-tuning, significantly enhancing classification and object detection accuracy.
STORM: Token-Efficient Long Video Understanding – Optimizes long-video processing for multimodal models, reducing token requirements while maintaining high accuracy.
EgoLife: Towards Egocentric Life Assistant – Advances AI-powered personal assistants using egocentric video datasets for long-term recall and event tracking.

Reasoning, Self-Improvement, and Problem-Solving

Cognitive Behaviors that Enable Self-Improving Reasoners – Identifies cognitive patterns that help LLMs improve through reinforcement learning, even when correctness is not guaranteed.
LADDER: Self-improving LLMs through Recursive Problem Decomposition – Boosts mathematical problem-solving by breaking down complex problems into simpler subproblems.
START: Self-taught Reasoner with Tools – Enhances LLM reasoning with external tool use, refining performance through guided prompt-based fine-tuning.
Process-based Self-Rewarding Language Models – Enables LLMs to iteratively refine their reasoning by incorporating self-evaluated feedback loops.

Uncertainty, Robustness, and Evaluation of LLMs

When an LLM is Apprehensive About Its Answers – Assesses LLM uncertainty by comparing entropy-based predictions against domain-specific correctness.
Mask-DPO: Generalizable Fine-grained Factuality Alignment – Aligns LLMs with factual accuracy by selectively training on verifiable statements, improving knowledge representation.
Large-Scale Data Selection for Instruction Tuning – Improves instruction-tuning by evaluating different dataset selection techniques, highlighting a more effective approach.
Lingoly-Too: Disentangling Memorization from Reasoning – Tests whether LLMs truly reason or just memorize by applying obfuscation techniques to language datasets.

Agent-Based Learning and Multi-Agent Systems

MPO: Boosting LLM Agents With Meta Plan Optimization – Improves LLM-based agents by introducing high-level meta-plans, refining their decision-making process.
ATLAS: Agent Tuning via Learning Critical Steps – Enhances LLM-driven agents by selectively fine-tuning them on critical decision-making steps.
Reliable and Efficient Multi-Agent Coordination via GNN-VAEs – Optimizes multi-agent planning using graph neural networks, ensuring scalability for real-world applications.

Applications in Games, Coding, and Specialized Domains

PokéChamp: an Expert-level Minimax Language Agent – Develops an LLM-based Pokémon battle agent that outperforms rule-based and LLM-assisted competitors.
Kodcode: A Diverse, Challenging, and Verifiable Coding Dataset – Improves code generation benchmarks with a 447K-problem dataset containing verified solutions and test cases.
Fine-Tuning Small Language Models for Domain-Specific AI – Optimizes small-scale models for edge AI applications, balancing efficiency and task-specific accuracy.
A Multimodal Symphony: Integrating Taste and Sound Through Generative AI – Explores the intersection of taste perception and music generation using generative AI models.

Search and Optimization for Planning Tasks

Language Models can Self-Improve at State-Value Estimation – Enhances search efficiency in interactive planning tasks by refining LLM state-value estimation.
HoT: Highlighted Chain of Thought for Referencing Supporting Facts – Improves fact-based reasoning in LLMs by introducing highlighted references to key information.
UFO: A Unified Approach to Fine-grained Visual Perception – Integrate object detection, segmentation, and vision-language tasks under a single, open-ended framework.
L^ 2M: Mutual Information Scaling Law for Long-Context Language Modeling – Establishes theoretical foundations for improving long-range dependencies in language models.

That’s all for today. Thank you for reading! Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve