The Most Important Algorithm for Transformers
Was this email forwarded to you? Sign up here The Most Important Algorithm for TransformersFlashAttention has a new version. Plus some important research milestones and major funding activity in AI.Next Week in The Sequence:
You can subscribe to The Sequence below:📝 Editorial: The Most Important Algorithm for TransformersThere are few algorithms that have had as much impact on the recent generation of transformer architectures as FlashAttention. Originally developed by researchers from Princeton University, including the renowned Tri Dao, FlashAttention and its successor FlashAttention-2 were able to improve the performance of attention mechanisms in GPUs by minimizing read-writes. Almost immediately after the original publication, FlashAttention was rapidly adopted within the new generation of transformers. There were not many complaints about FlashAttention, but one of the few was that it was unable to take full advantage of new hardware architectures. For instance, FlashAttention-2 is only able to achieve 35% utilization of max FLOPs in H100 GPUs. But now we have a new version. Last week, a group of AI researchers from Meta, Princeton University, NVIDIA, and other AI labs published the paper and open-source code for FlashAttention-3. The new version of the method uses several techniques to speed up attention in H100 GPUs, exploiting the asynchrony of the tensor cores. The result is simple: FlashAttention-3 is blazing fast. The new model achieves 75% theoretical max FLOP utilization in H100, which results in practical 1.5-2x performance improvements. The new algorithm is also able to use lower precision numbers, which reduces the memory footprint. FlashAttention-3 is an exciting development in generative AI algorithms. This method will almost certainly lead to improvements in large context windows in LLMs and better inference performance on modern GPU architectures. Impressive progress! 🔎 ML ResearchFlastAttention-3A group of AI researchers from Meta, Princeton University, Together AI, NVIDIA and others published a paper unveiling the new version of the famous FlastAttention algorithm. FlashAttention-3 takes advantages of the latest GPU advancements achieving 2x the performance of its predecessor and also exceling in long context LLM tasks —> Read more. Sub-Billion Parameter Models for MobileMeta AI published a paper introducing MobileLLM, a sub-billion parameter model optimized for on-device scenarios. MobileLLM uses a specific structure of embedding and attention layers that optimizes its efficiency relative to its size —> Read more. Generative Teaching for AgentsMicrosoft Research published a paper unveiling AgentInstruct, an agentic framework for creating syntethic data. Specifically, AgentInstruct focuses on datasets used for instruction tuning of base models —> Read more. Evaluating Multimodal Foundation ModelsResearchers from Carnegie Mellon University published a paper introducing the holitic evaluation of multimodal models(HEMM) framework . HEMM sets the primitives to systematically evaluate multimodal models across different tasks such as basic skills, information flow, and real-world use cases —> Read more. A Unified AI DatabaseMicrosoft Research published a paper proposing VBase, the foundation for a unified database for vector, relational and scalar data types. The core of VBase is based on a property called relaxed monotonicity that enables the unification of the different data types models —> Read more. Contamination in Code Generation BenchmarksResearchers from Cohere published a paper providing evidence of the levels of contamination of code generation benchmarks in major LLMs. The paper also proposes a Less Basic Python Problems, a new benchmark more resilient to contamination —> Read more. Autoregressive Models for Text-Image GenerationThe team bedhind the Generative AI Research Lab(GAIR) published a paper unveileing ANOLE, an autoregressive multimodal model for image and text generation. ANOLE is based on Meta AI’s Chameleon which guarantees a data and parameter efficient fine-tuning strategy —> Read more. 🤖 Cool AI Tech ReleasesClaude High Quality PromptsAnthropic released some features to evaluate and generate high quality prompts for Claude —> Read more. MInferenceMicrosoft released some demos of its MInference method for optimizing LLM inference performance —> Read more. AutoGen ModelsMicrosoft AutoGen added support for non OpenAI models —> Read more. 🛠 Real World AIAd Inference at MetaMeta shares some details about the AI inference architecture powering its ad serving system —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 411: Learn About Microsoft's Impressive 4 New AI Compilers
Thursday, July 11, 2024
Parallelism, computation, memory, hardware acceleration and control flow are some of the capabilities addressed by the new compilers. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 411: Autonomous Agents with Episodic Memory
Tuesday, July 9, 2024
Capturing memories of important events in the lifecycle of an agent. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple Goes Small and Super Multimodal
Sunday, July 7, 2024
Plus a lot of new models being released and quite an active week for AI VCs. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 410: Learn About Virtual Token Counter: A Novel Method that Address One of the Major Challenges LLM Serving
Thursday, July 4, 2024
Created by UC Berkeley and Stanford University, VTC introduced a fairness in LLM serving scheduling ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 409: Augmenting Autonomous Agents with Long-Term Memory
Tuesday, July 2, 2024
Making agents remember beyond the context window. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
WebAIM November 2024 Newsletter
Friday, November 22, 2024
WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to
➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux
Friday, November 22, 2024
Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and
JSK Daily for Nov 22, 2024
Friday, November 22, 2024
JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen
Friday, November 22, 2024
The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on
Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰
Friday, November 22, 2024
This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED
Daily Coding Problem: Problem #1616 [Easy]
Friday, November 22, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will
The problem to solve
Friday, November 22, 2024
Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights
Issue #568: Random mazes, train clock, and ReKill
Friday, November 22, 2024
View this email in your browser Issue #568 - November 22nd 2024 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to
Whats Next for AI: Interpreting Anthropic CEOs Vision
Friday, November 22, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 22, 2024? The HackerNoon
iOS Cocoa Treats
Friday, November 22, 2024
View in browser Hello, you're reading Infinum iOS Cocoa Treats, bringing you the latest iOS related news straight to your inbox every week. Using the SwiftUI ImageRenderer The SwiftUI ImageRenderer