The Most Important Algorithm for Transformers
Was this email forwarded to you? Sign up here The Most Important Algorithm for TransformersFlashAttention has a new version. Plus some important research milestones and major funding activity in AI.Next Week in The Sequence:
You can subscribe to The Sequence below:📝 Editorial: The Most Important Algorithm for TransformersThere are few algorithms that have had as much impact on the recent generation of transformer architectures as FlashAttention. Originally developed by researchers from Princeton University, including the renowned Tri Dao, FlashAttention and its successor FlashAttention-2 were able to improve the performance of attention mechanisms in GPUs by minimizing read-writes. Almost immediately after the original publication, FlashAttention was rapidly adopted within the new generation of transformers. There were not many complaints about FlashAttention, but one of the few was that it was unable to take full advantage of new hardware architectures. For instance, FlashAttention-2 is only able to achieve 35% utilization of max FLOPs in H100 GPUs. But now we have a new version. Last week, a group of AI researchers from Meta, Princeton University, NVIDIA, and other AI labs published the paper and open-source code for FlashAttention-3. The new version of the method uses several techniques to speed up attention in H100 GPUs, exploiting the asynchrony of the tensor cores. The result is simple: FlashAttention-3 is blazing fast. The new model achieves 75% theoretical max FLOP utilization in H100, which results in practical 1.5-2x performance improvements. The new algorithm is also able to use lower precision numbers, which reduces the memory footprint. FlashAttention-3 is an exciting development in generative AI algorithms. This method will almost certainly lead to improvements in large context windows in LLMs and better inference performance on modern GPU architectures. Impressive progress! 🔎 ML ResearchFlastAttention-3A group of AI researchers from Meta, Princeton University, Together AI, NVIDIA and others published a paper unveiling the new version of the famous FlastAttention algorithm. FlashAttention-3 takes advantages of the latest GPU advancements achieving 2x the performance of its predecessor and also exceling in long context LLM tasks —> Read more. Sub-Billion Parameter Models for MobileMeta AI published a paper introducing MobileLLM, a sub-billion parameter model optimized for on-device scenarios. MobileLLM uses a specific structure of embedding and attention layers that optimizes its efficiency relative to its size —> Read more. Generative Teaching for AgentsMicrosoft Research published a paper unveiling AgentInstruct, an agentic framework for creating syntethic data. Specifically, AgentInstruct focuses on datasets used for instruction tuning of base models —> Read more. Evaluating Multimodal Foundation ModelsResearchers from Carnegie Mellon University published a paper introducing the holitic evaluation of multimodal models(HEMM) framework . HEMM sets the primitives to systematically evaluate multimodal models across different tasks such as basic skills, information flow, and real-world use cases —> Read more. A Unified AI DatabaseMicrosoft Research published a paper proposing VBase, the foundation for a unified database for vector, relational and scalar data types. The core of VBase is based on a property called relaxed monotonicity that enables the unification of the different data types models —> Read more. Contamination in Code Generation BenchmarksResearchers from Cohere published a paper providing evidence of the levels of contamination of code generation benchmarks in major LLMs. The paper also proposes a Less Basic Python Problems, a new benchmark more resilient to contamination —> Read more. Autoregressive Models for Text-Image GenerationThe team bedhind the Generative AI Research Lab(GAIR) published a paper unveileing ANOLE, an autoregressive multimodal model for image and text generation. ANOLE is based on Meta AI’s Chameleon which guarantees a data and parameter efficient fine-tuning strategy —> Read more. 🤖 Cool AI Tech ReleasesClaude High Quality PromptsAnthropic released some features to evaluate and generate high quality prompts for Claude —> Read more. MInferenceMicrosoft released some demos of its MInference method for optimizing LLM inference performance —> Read more. AutoGen ModelsMicrosoft AutoGen added support for non OpenAI models —> Read more. 🛠 Real World AIAd Inference at MetaMeta shares some details about the AI inference architecture powering its ad serving system —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 411: Learn About Microsoft's Impressive 4 New AI Compilers
Thursday, July 11, 2024
Parallelism, computation, memory, hardware acceleration and control flow are some of the capabilities addressed by the new compilers. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 411: Autonomous Agents with Episodic Memory
Tuesday, July 9, 2024
Capturing memories of important events in the lifecycle of an agent. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple Goes Small and Super Multimodal
Sunday, July 7, 2024
Plus a lot of new models being released and quite an active week for AI VCs. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 410: Learn About Virtual Token Counter: A Novel Method that Address One of the Major Challenges LLM Serving
Thursday, July 4, 2024
Created by UC Berkeley and Stanford University, VTC introduced a fairness in LLM serving scheduling ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 409: Augmenting Autonomous Agents with Long-Term Memory
Tuesday, July 2, 2024
Making agents remember beyond the context window. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
📧 What Rewriting a 40-Year-Old Project Taught Me About Software Development
Saturday, December 28, 2024
What Rewriting a 40-Year-Old Project Taught Me About Software Development Read on: my website / Read time: 7 minutes The .NET Weekly is brought to you by: As the year wraps up, it's clear API
This Week in Rust #579
Saturday, December 28, 2024
Email isn't displaying correctly? Read this e-mail on the Web This Week in Rust issue 579 — 25 DEC 2024 Hello and welcome to another issue of This Week in Rust! Rust is a programming language
The Calm Voice Of Chaos 🏆
Friday, December 27, 2024
The protest singer whose songs shaped 2024. Here's a version for your browser. Hunting for the end of the long tail • December 27, 2024 The Calm Voice Of Chaos This year's Tedium awards start
JSK Daily for Dec 27, 2024
Friday, December 27, 2024
JSK Daily for Dec 27, 2024 View this email in your browser A community curated daily e-mail of JavaScript news Performance Optimization in React Pivot Table with Data Compression The Syncfusion React
Daily Coding Problem: Problem #1650 [Hard]
Friday, December 27, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Recall that the minimum spanning tree is the subset of edges of a tree that
🧠 3 Ways Quantum Computing Will Change Our World — How to Transfer Data to Your New iPhone
Friday, December 27, 2024
Also: Great Spotify Features That Apple Music Has Too, and More! How-To Geek Logo December 27, 2024 Did You Know 2004 was the last year that hidden (or "pop-up") headlamps appeared on a mass-
Charted | How U.S. Household Incomes Have Changed (1967-2023) 💰
Friday, December 27, 2024
When looking at inflation adjusted data, US households have definitely gotten a whole lot richer since 1967. View Online | Subscribe | Download Our App FEATURED STORY How US Household Incomes Have
Can Pirates Save Democracy?
Friday, December 27, 2024
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, December 27, 2024? The
The 2025 Predictions You Can't Afford to Miss 🔮
Friday, December 27, 2024
Get a head start on what's to come in the New Year. Join VC+ to gain access to our 2025 Global Forecast Series and other exclusive insights! View email in browser HOW LEADERS STAY AHEAD IN 2025 The
DeveloPassion's Newsletter #182 - 2024 Retrospective
Friday, December 27, 2024
A newsletter discussing Knowledge Management, Knowledge Work, Zen Productivity, Personal Organization, and more! Sébastien Dubois DeveloPassion's Newsletter DeveloPassion's Newsletter #182 -