This Week in Turing Post: |
Wednesday, AI 101, Model: What’s so cool about Llava-o1? Friday/Saturday after Thanksgiving, it’s a holiday in the use. So →
|
🦃 What would you like to receive on that day 🦃 |
|
Thank you for voting! |
The main topic |
Today we have a little bit more technical editorial than usual. |
The AI model race has spiraled into chaos. OpenAI’s GPT-4o-2024-11-20, suspected to be faster but less capable than its predecessor, briefly edged out Gemini Exp 1114 – until Gemini Exp 1121 reclaimed the lead. Versioning has been abandoned in favor of date-stamped releases, leaving us with an endless cycle of incremental updates masquerading as breakthroughs. |
This unrelenting push for dominance has left developers and researchers frustrated. The anticipated "GPT-5" and "Claude 4" remain elusive, while the industry obsesses over achieving benchmarks. Labs prioritize leaderboard rankings over delivering meaningful progress, creating a landscape where clarity and innovation are often sacrificed for speed. |
There is area though where the real work is happening: this week gave us a few meaningful small models and reserch on embeddings. |
Take Hymba, NVIDIA’s hybrid-head architecture combining transformer attention with state space models (SSMs). Hymba-1.5B not only outperforms larger counterparts like Llama-3.2-3B but does so with drastically reduced memory usage and boosted throughput. Its sliding window attention and learnable meta tokens are innovations worth celebrating. |
Similarly, SlimLM redefines on-device AI. Built for smartphones, this compact language model balances privacy and efficiency, achieving impressive document assistance tasks directly on devices like the Samsung Galaxy S24. Small models are quietly making transformative contributions. |
Multimodality also sees breakthroughs. BlueLM-V-3B is a large multimodal but for mobile devices. It excels in multilingual OCR and image-to-text tasks, leveraging embeddings to optimize mobile efficiency. |
Meanwhile, Jina CLIP v2 delivers multilingual, multimodal embeddings for text and images, combining power with compactness through Matryoshka representations. |
While "frontier labs" chase benchmarks, these compact models quietly redefine efficiency and usability, highlighting how smaller, more focused developments might be more meaningful than incremental improvements in large models. |
If you like Turing Post, consider becoming a paid subscriber or sharing this digest with a friend. It helps us keep Monday digests free → | |
|
|
|
|
| 12 Researches on Sparse Autoencoders | Explore what is special about sparse autoencoders (SAEs) and their use for different purposes | www.turingpost.com/p/sparse-autoencoders-studies |
| |
|
|
Weekly recommendation from AI practitioner👍🏼 |
Anthropic caught our AI practitioner's attention today with the launch of the Model Context Protocol (MCP), an open standard for connecting AI systems to various data sources. Designed to simplify integrations, it offers secure, scalable connections to tools like Google Drive and GitHub. Developers can access pre-built servers, SDKs, and an open-source repository to build smarter, context-aware AI. It’s a much-needed solution, and if others adopt it beyond Claude, it could be a game-changer. |
|
|
Top Research – is all about models today |
Tülu 3: Open Models, Closed Gaps The Allen Institute for AI’s Tülu 3 raises the bar for open post-training. With curated prompts, synthetic fine-tuning, and a trailblazing RLVR framework, it bests Llama 3.1-Instruct and nips at the heels of proprietary systems on GSM8K and IFEval. Open-source innovation with competitive precision →read more |
Marco-o1: Alibaba’s Path to Open Reasoning Alibaba’s Marco-o1 embraces Chain-of-Thought tuning and Monte Carlo Tree Search to tackle open-ended challenges. With +6% MGSM gains and outperforming Google Translate in nuanced tasks, it redefines reasoning’s potential. Self-correcting, confident, and cutting-edge →read more |
DeepSeek-R1-Lite: Another take on OpenAI DeepSeek introduces R1-Lite-Preview, a reasoning AI designed for logical inference, math-heavy tasks, and real-time problem-solving. Leveraging chain-of-thought reasoning, it matches or surpasses OpenAI’s o1-preview on benchmarks like AIME and MATH. With plans to open source its R1 models and APIs, DeepSeek aims to energize AI innovation worldwide. Chinese companies are catching up →read more |
Bi-Mamba: Binary Brilliance Bi-Mamba, a joint effort from MBZUAI and CMU, makes 1-bit modeling a reality. It cuts storage by 80%, saves energy, and rivals full-precision models like Mamba-2. Tailored for low-bit hardware, it proves efficiency can shine without compromise →read more |
Pixtral Large: Mistral’s Multimodal Giant Mistral AI’s Pixtral Large, with 124B parameters, redefines multimodal AI. From documents to high-res images, it handles enterprise challenges effortlessly, outpacing GPT-4o and Claude-3.5 Sonnet on key tests. A new multimodal champion emerges →read more |
You can find the rest of the curated research at the end of the newsletter. |
|
We are reading |
|
|
News from The Usual Suspects © |
AlphaQubit: Sharpening Quantum’s Edge Google DeepMind’s AlphaQubit tackles quantum computing’s Achilles’ heel: error correction. By leveraging advanced neural networks, it boosts accuracy by 30% on Google’s Sycamore processor, surpassing traditional decoders. While too slow for real-time use, it’s a promising leap toward scalable quantum systems. Qwen2.5-Turbo: 1 million token context! Alibaba’s Qwen2.5-Turbo shatters token limits with 1M-token contexts, slicing processing time by 4.3x thanks to sparse attention. Capable of analyzing vast texts or codebases, it outperforms GPT-4 and redefines cost efficiency at ¥0.3/1M tokens. Practical adoption remains a challenge, but the potential is game-changing. H Dives Into the 'Agentic' Era with Runner H Paris-based AI startup H unveils Runner H, its first product after raising $220 million in funding. With a compact 2-billion-parameter LLM, the platform targets businesses with agentic tools for robotic process automation (RPA), quality assurance, and outsourcing. Runner H claims efficiency and performance beyond bigger rivals like Anthropic. A bold entry in AI's second era? Time will tell. We haven’t tried it, but we joined the waitlist. Microsoft Copilot: The Future of Workflow At Ignite 2024, Microsoft unveiled new Copilot capabilities, automating tasks and enhancing collaboration. With features like Copilot Actions, Pages, and Teams’ Interpreter agent, productivity soars across global teams. The Copilot Control System ensures secure adoption, solidifying Microsoft’s AI leadership. Much more detailed about their updates on their blog. Cerebras: Breaking the Speed Barrier Cerebras’ LLM inference processor now delivers molecular dynamics simulations 700 times faster than the Frontier supercomputer. What once took two years of computation can now be achieved in a day, redefining scientific research timelines. So cool. Anthropic: $4B More for Responsible AI Amazon doubles down on Anthropic with a fresh $4B investment, as the company highlights progress on voluntary AI safety commitments. Building AI responsibly while competing at the highest levels. Read more xAI: Elon Musk’s AI Ambitions Soar It’s kinda crazy bur Elon Musk’s xAI raised another $5B, doubling its valuation to $50B. With backing from heavyweights like Qatar Investment Authority and Andreessen Horowitz, Musk’s vision for AI dominance accelerates.
|
|
🌏 We support 🌍 |
| Frugal AI Challenge | The goal of this challenge is to encourage both academic and industry actors to keep efficiency in mind when deploying AI models. By tracking both energy consumption and performance for different Aasks, we can incentivize frugality in AI deployment while addressing real-world challenges. | frugalaichallenge.org |
| |
|
|
More interesting research papers from last week |
Multimodal and Vision-Language Models |
|
Reinforcement Learning and Transfer Learning |
|
Retrieval and Knowledge Synthesis |
|
Memory, Efficiency, and Scaling Innovations |
|
Agent Architectures and Robotics |
|
Alignment, Verification, Guardrails, and Safety Frameworks |
|
Leave a review! |
|
Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription! |
|
|
|