| | This Week in Turing Post: | Wednesday, AI 101: Everything about Whisper Model Friday, Agentic Workflows series: Use cases
| | The main topic | If in my childhood someone had told me I would set Matryoshka against Transformer, I would have been puzzled. After all, one is a symbol of traditional Russian craftsmanship – stacking dolls within dolls, each revealing something hidden beneath. The other? A futuristic robot capable of morphing into various forms, epitomizing adaptability. Yet here we are, years later, first using 'Matryoshka' to describe layered, nested representation learning within 'Transformer' architectures. And then – using Matryoshka in a rival architecture! | The first merging of concepts happened in 2023, when researchers from Google Research presented MatFormer. In it, each Transformer block was designed with nested sub-blocks, where smaller submodels (like layers in a Matryoshka doll) are contained within larger ones. This enables the model to dynamically extract submodels of varying sizes from a single universal model without the need for separate training, allowing for flexible scaling and elastic inference across tasks and modalities. This is called Matryoshka Representation Learning. | This approach allows scaling the model down by using only specific parts, while still retaining the necessary knowledge and performance. These smaller submodels work efficiently without requiring additional training, as they share the same underlying space as the larger model. | Recently, however, Transformers are facing increasing critiques. AI21 CEO Ori Goshen challenges the supremacy of Transformers. He argues that agents relying on these models struggle with efficiency and cost. He – understandably – advocates for AI21's JAMBA architecture, based on Mamba, claiming it promises faster, more reliable AI agents with better memory performance. | Well, Mamba, as we’ve explained before, is indeed a legitimate candidate to rival Transformers. But what if we combine it with the good old Matryoshka to deal an even bigger blow to Transformers? | Researchers from Scaled Foundations and the University of Washington did exactly that. MatMamba integrates Matryoshka Representation Learning into Mamba2's State Space Model (SSM), creating a flexible, nested architecture across its parameters. This design allows for the extraction of multiple smaller models from a single, large model without retraining. Each submodel retains critical learned representations, ensuring consistent performance across varying sizes. | Compared to MatFormer and Transformers, MatMamba offers faster inference – especially for long sequences – due to its SSM backbone and more granular, adaptive scaling across compute requirements. | For example, on edge devices with limited resources, MatMamba can dynamically extract smaller models without retraining, allowing inference to adjust to available memory or compute power – something Transformers struggle with due to their rigid architecture. | In cloud inference scenarios, where compute resources fluctuate, MatMamba’s ability to flexibly switch between submodels allows for efficient, real-time scaling. While Transformers dominate general-purpose tasks, MatMamba could surpass them in domains where long context and elastic deployment are needed, such as real-time video analysis or large-scale image retrieval. | To be realistic, MatMamba is unlikely to entirely replace Transformers in every context, as both excel at different tasks. Instead, it may carve out a niche in applications demanding both high efficiency and adaptive, scalable inference. | As multi-agent ecosystems emerge, we will see more attempts to create alternatives to Transformers that may steal the spotlight. | | 💎 We recommend - Expert insights at GenAI Productionize 2.0 | Don’t miss GenAI Productionize 2.0 – the premier conference for GenAI application development, featuring AI experts from leading brands, startups, and research labs! | | Learn actionable insights, strategies, and techniques for generative AI stack design, governance, evaluation, and observability. | But don’t take our word for it; here are real quotes from previous attendees: | "I'm blown away by the high quality and value of this event." - Ricardo B. "Great event - worth getting up at 4am in the morning for!" - Sandy A. "Spectacular and very insightful summit! Very well done!" - Chad B.
| | | |
| 10 New Approaches for Making Transformers More Efficient | While Transformers are critiqued, they are still dominate the AI world. Learn how to make them more efficient | www.turingpost.com/p/10-new-approaches-for-making-transformers-more-efficient |
| |
|
| | | News from The Usual Suspects © | News from The Usual Suspects © | Adobe Unleashes Generative Fireworks at MAX Adobe drops major updates at its MAX conference, expanding its Firefly AI with the first video model safe for commercial use. New AI tools in Premiere Pro help smooth transitions and extend clips, while over 100 new Creative Cloud features land across flagship apps. Also in the mix: collaborative creativity via Project Concept and the GenStudio platform for marketing pros. Oh, and Gatorade bottles—now personalized with Firefly.
Two Nobel Prizes (in Chemistry and Physics) were awarded for achievements rooted in Deep Learning! We explained what for in our ML flashcards. OpenAI’s Swarm of AI Workers OpenAI's latest cookbook introduces "routines" and "handoffs" to orchestrate AI agents more efficiently, making the leap from flashy demos to robust multi-agent workflows. With tools like Swarm, AI agents can now smoothly pass conversations to each other, handling tasks such as refunds, sales, and support, all while minimizing bottlenecks in the process. Enterprise AI just got smarter.
TSMC: AI's Chip Champion TSMC's third-quarter profits are set to soar 40%, fueled by surging AI chip demand from tech giants like Apple and Nvidia. As the world’s leading contract chipmaker, TSMC is expanding globally, spending $65 billion on U.S. factories, but keeping most production in Taiwan. With shares up 77% this year, TSMC is riding high on the AI boom.
Anthropic in its Loving Grace Dario Amodei’s 15,000 words investor pitch that introduces a new term ‘Powerful AI’ instead of AGI %/ More practical: Anthropic rolls out the Message Batches API, cutting costs by 50% for developers dealing with massive datasets. Now, you can batch up to 10,000 queries with Claude 3.5 Sonnet, Opus, and Haiku, processed within 24 hours. Perfect for non-time-sensitive work, this API offers scalable data analysis minus infrastructure headaches. Quora’s already onboard, loving the smooth ride.
Gradio 5: Web Apps on Rocket Fuel Hugging Face launches Gradio 5, amping up ML web apps with sleek design, server-side rendering for lightning-fast loads, and real-time streaming. Low-latency, production-ready apps with just a few lines of Python, plus, an AI playground that lets you create apps right in your browser.
Writer’s Palmyra X 004 Takes Action Writer introduces Palmyra X 004, a powerhouse AI model built to handle enterprise tasks with finesse. Now with tool-calling capabilities, it automates workflows across apps, pulling data, running code, and even sending emails. This LLM also leads the pack in performance benchmarks, showing up OpenAI and Anthropic.
Wondering what Inflection AI has been up to? Inflection AI, in collaboration with Intel Gaudi® 3, launches Inflection for Enterprise, powered by the high-performing Inflection 3.0 model. Designed for businesses that need more than a chatbot, it offers full control over data, models, and architecture – on-prem, cloud, or hybrid.
| We are reading | | The freshest research papers, categorized for your convenience | Our TOP | | TuringPost @TheTuringPost | |
| |
Differential Transformer, proposed by @MSFTResearch and @Tsinghua_Uni, helps the model to pay more attention to important info. It uses differential attention to subtract one attention map from another, reducing noise and highlighting relevant parts. Here's how it works: | | | | 11:07 AM • Oct 11, 2024 | | | | 190 Likes 33 Retweets | 5 Replies |
|
| AI Model Architectures & Optimization | Retrieval-Augmented Decision Transformer: External Memory for In-Context RL Incorporates external memory into reinforcement learning, improving in-context learning with reduced reliance on long episodes. Read the paper OPTIMA: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System Enhances multi-agent system performance by using LLMs with reduced communication complexity and token usage while increasing task performance. Read the paper Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations Proposes larger convolutional kernels for ConvNets to improve spatial information capture and outperform vision transformers in various tasks. Read the paper TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention Improves LLM decoding efficiency by employing sparse attention, reducing memory and computational costs. Read the paper SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe Introduces a novel instruction-tuning approach that improves LLM performance on instruction-following tasks by mitigating overfitting. Read the paper MathCoder2: Better Math Reasoning from Continued Pretraining on Model-Translated Mathematical Code Enhances LLM mathematical reasoning by pretraining on a math-focused dataset, improving performance on math-related tasks. Read the paper ϵ-VAE: Denoising as Visual Decoding Proposes a new visual autoencoder method that improves both image reconstruction and generation through an iterative denoising process. Read the paper One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Introduces a new fine-tuning method that redistributes ranks in activation vectors to maximize explained variance, improving task performance. Read the paper ONLY-IF: Revealing the Decisive Effect of Instruction Diversity on Generalization Demonstrates that diverse instruction types are essential for LLMs to generalize well to new tasks, highlighting the importance of varied datasets. Read the paper Inference Scaling for Long-Context Retrieval Augmented Generation Optimizes retrieval-augmented generation by scaling inference parameters, improving performance for long-context and multi-hop queries. Read the paper
| | AI Agents & Agentic Frameworks | AGENT S: An Open Agentic Framework that Uses Computers Like a Human Mimics human interaction with computers through a GUI, performing complex multi-step tasks autonomously using memory-based learning. Read the paper WALL-E: World Alignment by Rule Learning Improves World Model-Based LLM Agents Aligns LLMs with environment dynamics through rule learning, improving decision-making and reducing errors in real-world tasks. Read the paper Emergent Properties with Repeated Examples Demonstrates that repeated training examples can significantly enhance model performance, especially in tasks with smaller datasets. Read the paper
| | Learning, Safety & Alignment in AI | DATA ADVISOR: Dynamic Data Curation for Safety Alignment of Large Language Models Improves the safety of LLMs by dynamically refining data generation, targeting underrepresented safety issues. Read the paper Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning Uses Monte Carlo Tree Search to enable LLMs to self-improve in reasoning tasks by refining stepwise training. Read the paper Self-Boosting Large Language Models with Synthetic Preference Data Enables LLMs to improve themselves by generating synthetic preference data for better task performance. Read the paper Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders Explores vulnerabilities in LLMs where they can unintentionally recommend malicious code, emphasizing the need for improved safeguards. Read the paper
| | Multimodal and Multitasking Capabilities | Everything Everywhere All At Once: LLMs Can In-Context Learn Multiple Tasks in Superposition Reveals that LLMs can perform multiple distinct tasks simultaneously during a single inference, offering insights into task superposition capabilities. Read the paper Token-Level Detective Reward Model for Large Vision Language Models Introduces a reward model that provides fine-grained feedback at the token level for multimodal models, enhancing error diagnosis and correction. Read the paper Personalized Visual Instruction Tuning Enhances LLMs' ability to conduct personalized conversations by training models to recognize specific individuals in images. Read the paper
| | Novel AI Capabilities & Creativity | Diversity-Rewarded CFG Distillation Promotes creativity in generative models by distilling Classifier-Free Guidance into model weights, reducing computational cost while maintaining high diversity in outputs. Read the paper SUPERCORRECT: Supervising and Correcting Language Models with Error-Driven Insights Improves reasoning in smaller LLMs by using hierarchical guidance from larger models and enhancing error correction. Read the paper LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Explores how LLMs internally encode truthfulness information and how this data can be leveraged to reduce hallucinations. Read the paper
| | Specialized AI Systems & Task-Specific Performance | F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Introduces a text-to-speech model that achieves high-quality, zero-shot speech generation and code-switching by using a non-autoregressive approach. Read the paper Erasing Conceptual Knowledge from Language Models Proposes a framework for selectively erasing specific conceptual knowledge from LLMs while preserving overall fluency and accuracy in other tasks. Read the paper STUFFED MAMBA: State Collapse and State Capacity of RNN-based Long-Context Modeling Explores challenges in RNN-based models for long-context modeling, proposing solutions to mitigate performance degradation over long sequences. Read the paper Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA Studies the complementary strengths of humans and AI in question answering, showing where each excels in different reasoning tasks. Read the paper
| | Models | TinyEmo: Scaling Down Emotional Reasoning via Metric Projection – a small multimodal model for emotion classification, leveraging a synthetic emotional dataset and a Metric Projector for efficient task handling, outperforming much larger models in emotion-related tasks →read the paper | Falcon Mamba: The First Competitive Attention-Free 7B Language Model – a 7B model that achieves superior performance in long-context processing and inference speed, all without attention mechanisms, surpassing larger models across benchmarks →read the paper | Pixtral 12B – a 12B-parameter multimodal model excelling in both image and text understanding, offering state-of-the-art performance on multimodal and text-only tasks, outperforming similarly and larger-sized models →read the paper | Baichuan-Omni Technical Report – a 7B open-source multimodal model processing text, images, videos, and audio, excelling particularly in Chinese benchmarks and providing robust performance across diverse modalities →read the paper | ARIA: An Open Multimodal Native Mixture-of-Experts Model, excelling in multimodal tasks, with competitive performance in both language and multimodal benchmarks, offering enhanced long-context handling and surpassing proprietary models like GPT-4o →read the paper | Leave a review! | | Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription! |
|
| |
|