December 02, 2024 | Read Online

🌁#78: Enabling the Future of AI (2025)

join the prediction game plus our usual collection of interesting articles, relevant news, and research papers. Dive in!

This Week in Turing Post:

Wednesday, AI 101, Technique/Method: What is Flow Matching?
Friday/Saturday, Agentic Workflow: Let’s unfold the core elements, starting with Profiling and Knowledge

If you like Turing Post, consider becoming a paid subscriber or sharing this digest with a friend. It helps us keep Monday digests free →

Become a Support

The main topic – Making a tradition to predict together

Making predictions, especially about the future, is famously tricky yet remains a favorite year-end tradition. Antoine de Saint-Exupéry said it well: "Your task is not to foresee the future but to enable it." I believe by choosing the right predictions, we enable the future the way we’d like it to be.

Last year, during the first week of December, Clem Delangue, CEO of Hugging Face and our dearest subscriber, published his predictions for 2024. We shared them and asked you to send us your predictions. We had an amazing response from people like Sara Hooker (Cohere), Yoshua Bengio (Mila), Max Hjelm (CoreWeave), and others. An analysis will follow of which 2024 predictions came to life!

Today, we want to do the same, since Clem was right on time again. And let’s make it into a tradition!

clem 🤗 @ClementDelangue

Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):

- There will be the first major public protest related to AI
- A big company will see its market cap divided by two or more because of AI
- At least 100,000 personal AI robots will be… x.com/i/web/status/1…

1:31 PM • Dec 2, 2024

478 Likes 85 Retweets

60 Replies

The only thing we will change this time is that we’re giving you three questions to start with. And if you want to add more predictions, feel free to do that. You can also skip question/s altogether (though, we’d like you to answer at least one of them!)

Our three questions are:

What paper of 2024 is so significant that it will change 2025? Or what was the paper that surprised you the most?
What industry will experience the most disruption from AI advancements in 2025?
What overlooked challenge in AI today will become a major focus in 2025? or What are research areas that are overlooked?

Send us your thoughts – ks@turingpost.com – to be featured in the special Predictions Edition of Turing Post!

→ OR SIMPLY REPLY TO THIS EMAIL WITH YOUR PREDICTIONS ←

The main topic 2: Reasoning development on steroids

So, the conversations and actions around reasoning are heating up. In the last two weeks, we’ve been “exposed” to two very promising previews from China: DeepSeek-R1 and Alibaba’s QwQ-32B, both attempting to challenge OpenAI’s o1. Meanwhile, Google’s DeepMind is reportedly developing an AI model with advanced reasoning, leveraging chain-of-thought prompting.

(Quick reminder: In AI, reasoning refers to the ability of models to logically process information, analyze relationships, and generate coherent solutions or conclusions—key to achieving more human-like understanding and decision-making.)

And though it is very tempting to jump into the conversation about reasoning, we decided to wait for QwQ’s tech report – this model made an especially big splash but it’s still a preview. Allegedly, the report is due out in about a month.

We’ve been collecting some papers to analyze along with that research. And here is our list from the last week. Unsurprisingly, they all come from Chinese AI labs. Unsurprisingly, because you can see the Chinese forte in action: innovation might be hard for the country, but copying, catching up, and improving on that foundation is what makes it outstanding. We will analyze and explain the significance of these and other reasoning-related papers later when examining the QwQ report. For now, we’re sharing the links if you want to dive right in:

Shanghai Jiao Tong University and GAIR researchers surpassed OpenAI’s O1-preview in AIME 2024 with a simple distillation method and limited samples. Their model excelled in safety/generalization but relied on the teacher model, urging first-principles research for sustainable AI innovation →read the paper
Tsinghua University researchers found LLMs using implicit reasoning skip step-by-step logic, relying on memory/intuition. Probing showed instability and less reliability versus explicit Chain-of-Thought, critical for accurate complex reasoning →read the paper
Also Tsinghua University introduced HiAR-ICL, automating reasoning in In-Context Learning with Monte Carlo Tree Search and "thought cards." HiAR-ICL emphasizes how to think, systematically addressing reasoning challenges via structured automation →read the paper

Share the newsletter

Twitter library

Top 10 GitHub Repositories to Master AI, Machine Learning and Data Science

A gem collection

www.turingpost.com/p/10-github-repositories-ai-ml-ds

Top Research (the last week was rich!)

Absolute hit from our Twitter (follow us there) →

TuringPost @TheTuringPost

Natural Language Reinforcement Learning (NLRL) redefines Reinforcement Learning (RL).

The main idea:
In NLRL, the core parts of RL like goals, strategies, and evaluation methods are reimagined using natural language instead of rigid math.

What are the benefits?

- NLRL uses not… x.com/i/web/status/1…

1:00 AM • Nov 29, 2024

775 Likes 149 Retweets

12 Replies

Star Attention: NVIDIA introduced a block-sparse attention mechanism for Transformer-based LLMs. It uses local/global attention phases to achieve up to 11x inference speedup on sequences up to 1M tokens while retaining 95-100% accuracy →read the paper
LLM-as-a-Judge: Arizona State University reviewed using LLMs for judgment tasks. They presented a taxonomy of methodologies and applications, highlighting bias, vulnerabilities, and self-judgment, with future directions in human-LLM collaboration and bias mitigation →read the paper
MH-MoE: Microsoft’s Multi-Head Mixture-of-Experts (MH-MoE) improves sparse MoE by adding multi-head attention, reducing perplexity without increasing FLOPs, and demonstrating robust performance under quantization →read the paper
Boundless Socratic Learning: Google DeepMind’s framework leverages recursive language-based "games" for self-improvement, meeting conditions of feedback, coverage, and scalability. It suggests a roadmap for scalable AI via autonomous data generation and feedback loops →read the paper

You can find the rest of the curated research at the end of the newsletter.

We are reading/watching

Simon Willison: The Future of Open Source and AI | Around the Prompt #10

Ben Thompson’s article, "The Gen AI Bridge to the Future," argues that generative AI is the key link between today’s devices and the wearable computing era, enabling transformative, context-aware interactions.
Devansh’s piece on Dostoevsky (how could I miss that!) explores the dangers of over-rationalization and idol worship, emphasizing unconditional love as a remedy for societal pitfalls. Read it here.
MIT Tech Review’s "AI Minecraft Experiment Breakthrough" showcases Altera’s Project Sid, where 1,000 LLM-powered agents in Minecraft formed communities, jobs, and even a parody religion, hinting at AI's potential to model human dynamics.
The New Yorker highlights robotics’ “ChatGPT moment,” as AI-driven learning gives robots dexterity and general-purpose capabilities. Explore the revolution.

Google DeepMind revolutionizes 4D content creation and reinvents time-series analysis with visuals
- DeepMind's CAT4D framework takes scene reconstruction to the next dimension—literally. Combining multi-view video diffusion with cutting-edge deformable Gaussian models, it reimagines 4D (dynamic 3D) filmmaking, AR, and synthetic content creation. State-of-the-art results, no external priors needed. Lights, camera, 4D action!
- Google’s multimodal models turn time-series data into plot-based prompts, boosting accuracy by 120% and cutting costs tenfold. From fall detection to physical activity tracking, the future of analysis is picture-perfect.
Microsoft’s LazyGraphRAG: A new RAG benchmark
- Think smarter, not costlier. LazyGraphRAG skips pre-indexing and crushes costs to 0.1% of its rivals. Merging local agility with global prowess, it’s 700x cheaper and twice as sharp in data analysis. Perfect for those who hate overspending on exploratory AI.
GitHub funds open source security
- A $1.25M bounty for better security in open source. GitHub teams up with Amex, Shopify, and Stripe to bolster the projects that keep the code world turning.
Anthropic's MCP Bridges AI and Data
- Anthropic unveils the Model Context Protocol (MCP), an open standard that connects AI tools with diverse data sources. By unifying fragmented integrations, MCP allows systems like Google Drive, GitHub, and Slack to interact seamlessly.
Meta AI speeds AI training with SPDL
- Meta’s new multi-threading framework, SPDL, streamlines data handling for AI training. Faster loading, better scaling—because time is (computing) money.
Andrew Ng simplifies LLM Integration with 'aisuite'
- Tired of juggling APIs? Andrew Ng’s 'aisuite' lets developers seamlessly switch between large language models by simply updating a string. Might be as well a Recommendation from our AI practitioner!

Amazing models from the last week

OLMo 2 by Allen AI
Allen AI unveils OLMo 2 with 7B and 13B parameter models trained on 5 trillion tokens.
Alibaba’s QwQ-32B
Alibaba’s QwQ-32B stirs excitement with strong math, coding, and reasoning benchmarks, placing it between Claude 3.5 Sonnet and OpenAI’s o1-mini. Optimized for consumer GPUs via quantization, it’s open-sourced under Apache, revealing reasoning tokens and weights, yet shows Chinese regulatory constraints. A tech report is expected in a month.
ShowUI: GUI Automation
Show Lab, NUS, and Microsoft introduce ShowUI, a 2B vision-language-action model tailored for GUI tasks. It features UI-guided token selection (33% fewer tokens), interleaved streaming for multi-turn tasks, and a curated 256K dataset, achieving 75.1% zero-shot grounding accuracy.
Adobe’s MultiFoley
Adobe debuts MultiFoley, an AI model generating high-quality sound effects from text, audio, and video inputs. Cool demos highlight its creative potential.
INTELLECT-1 by Prime Intellect
INTELLECT-1, a 10B LLM trained over 42 days on 1T tokens across 14 global nodes, leverages the PRIME framework for exceptional efficiency (400× bandwidth reduction). Open-sourced INTELLECT-1 and PRIME signal a leap in decentralized training scalability.

More interesting research papers from last week

Large Language Model-Brained GUI Agents: A Survey examines the evolution and application of LLM-powered GUI agents, highlighting their frameworks, datasets, and benchmarks for automating GUI interactions.
From CISC to RISC: Language-Model Guided Assembly Transpilation enables energy-efficient assembly translation between instruction sets with LLM-guided transpilation techniques.

Storytelling and Creativity

Dreamrunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation enables smooth storytelling video generation with fine-grained motion and multi-object consistency using hierarchical plans and retrieval-based adaptation.
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting decouples object attributes for editable image inpainting, enhancing attribute modification while preserving realism and identity.
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching eliminates fine-tuning with a feature caching approach for lightweight personalized image generation, reducing computational costs.

Novel Methodologies and Predictive Insights

Predicting Emergent Capabilities by Finetuning forecasts LLM capability emergence using finetuning data, enabling cost-efficient pretraining evaluations and capability predictions.
Training and Evaluating Language Models with Template-based Data Generation generates large-scale datasets for math reasoning via meta-template synthesis, enhancing reasoning in LLMs.
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models combines graph tokenization and hybrid architectures for improved parameter efficiency and state-of-the-art performance.
Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding improves LLM efficiency by dynamically adjusting draft lengths in speculative decoding, enhancing inference consistency.

Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!