Cerebras Inference and the Challenges of Challenging NVIDIA’s Dominance
Was this email forwarded to you? Sign up here Cerebras Inference and the Challenges of Challenging NVIDIA’s DominanceWhy does NVIDIA remains virtually unchallenged in the AI chip market?Next Week in The Sequence:
You can subscribe to The Sequence below:📝 Editorial: Cerebras Inference and the Challenges of Challenging NVIDIA’s DominanceAI hardware is experiencing an innovation renaissance, with well-funded startups emerging everywhere. Yet, NVIDIA remains virtually unchallenged, holding close to a 90% share of the AI chip market. Why is that? We've all heard explanations about the advantages of NVIDIA’s software stack for acceleration compared to platforms like AMD’s, which seems like a lazy explanation for why NVIDIA is out-innovating its competitors. A simple theory that I’ve discussed with several scientists and engineers who pretrain large foundation models is that NVIDIA is the only platform receiving regular feedback about the performance of chips during pretraining runs with tens of thousands of GPUs. It turns out that at that scale, many challenges arise that are nearly impossible to simulate on a smaller scale. I will elaborate more on that theory in a future post, but the main point is that there is a very high barrier to entry when it comes to challenging NVIDIA chips for pretraining. The only viable candidate seems to be Google TPUs, which have definitely been tested at massive scale. If pretraining is out of the equation, the obvious area to explore is inference. Here, we have a completely different playing field, where performance optimizations can be applied at a smaller scale, making it more conducive to startup disruptions. One of the viable challengers to NVIDIA’s dominance in AI inference is Cerebras. Just last week, the well-funded startup unveiled Cerebras Inference, a solution capable of delivering Llama 3.1 8B at 450 tokens per second for Llama 3.1 70B. This is approximately 20x faster than NVIDIA GPUs and about 2.4x faster than Groq. The magic behind Cerebras' performance is its AI chip design, which allows the entire model to be stored on-chip, eliminating the need for GPU communication. Cerebras Inference looks impressive from top to bottom and clearly showcases the massive potential for innovation in AI inference. Competing with NVIDIA will require more than just faster chips, but Cerebras appears to be a legitimate challenger. 🔎 ML ResearchThe Mamba in the LlamaResearchers from Princeton University, Together AI, Cornell University and other academic institutions published a paper proposing a technique to distill and accelerate transformer-SSM models. The method distills transformers into RNN-equivalents with a quarter of the hidden layers —> Read more. Diffusion Models as Real Time Game EnginesGoogle Research published a paper presenting GameNGen, a game engine powered by diffusion models and interactions with real environments over long trajectories. GameNGen can simulate a DOOM game in over 20 frames in a single TPU —> Read more. LLMs that Learn from MistakesResearchers from Meta FAIR and Carnegie Mellon University published a paper outlining a technique to include error-correction data directly in the pretraining stage in order to improve reasoning capabilities. The resulting model outperform alternatives trained in error-free data —> Read more. Table Augmented GenerationIn a new paper, Researchers from UC Berkeley proposed table augmented generation, a method that addresses some of the limitations of text-to-SQL and RAG and answer questions in relational databases. The TAG model captures a very complete sets of interaction between an LLM and a database —> Read more. DisTrONous Research published a paper introducing DisTrO, an architecture that reduces inter-GPU communication by up to 5 orders of magnitude. DisTrO is an important method for low latency training of large neural networks —> Read more. Brain Inspired DesignMicrosoft Research published a summary of their recent research in three projects that simualte the brain learns. One project simulates the brain computes information, another enhances accuracy and efficiency and the third one shows improves proficiency in language processing and pattern recognition —> Read more. 🤖 AI Tech ReleasesQwen2-VLAlibaba Research released a new version of Qwen2-VL their marquee vision language model —> Read more. Cerebras InferenceCerebras released an impressive inference solution that can generate 1800 tokens per second in Llama 3.1 models —> Read more. NVIDIA NIM BlueprintsNVIDIA released NIM Blueprints, a series of templates to help enterprises get started with generative AI applications —> Read more. Gemini ModelsGoogle DeepMind released a new series of experimental models —> Read more. Command RCohere released a new version of Command R with improvements in coding, math, reasoning and latency —> Read more. 🛠 Real World AIRecommendations at NetflixNetflix discusses some of the AI techniques to enhances long term satisfaction in their content recommendations —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📝 Guest Post: Will Retrieval Augmented Generation (RAG) Be Killed by Long-Context LLMs?*
Friday, August 30, 2024
Pursuing innovation and supremacy in AI shows no signs of slowing down. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 426: Reviewing Google DeepMind’s New Tools for AI Interpretability and Guardrailing
Thursday, August 29, 2024
Gemma Scope and ShieldGemma are some of the latest additions to DeepMind's Gemma stack ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 425: Inside Mamba, the Most Famous SSM Model
Tuesday, August 27, 2024
In this issue: ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Black Forest Labs
Sunday, August 25, 2024
The startup powering image generation for xAI's Grok. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 424: How DeepMind's AlphaProof and AlphaGeometry-2 Achieved Silver Medal Status in the International Math Oly…
Thursday, August 22, 2024
One model focuses on algebra and number theory, while the other mastered geometry. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
This Week in Rust #565
Thursday, September 19, 2024
Email isn't displaying correctly? Read this e-mail on the Web This Week in Rust issue 565 — 18 SEP 2024 Hello and welcome to another issue of This Week in Rust! Rust is a programming language
Daily Coding Problem: Problem #1561 [Easy]
Thursday, September 19, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Print the nodes in a binary tree level-wise. For example, the following
Sixteen Candles Down the Drain
Thursday, September 19, 2024
Spectacles, Vestager, EC Posts, Meta Letters, PayPal Design, Microsoft Deals, Palmer Luckey Goggles, Spotify Ads Sixteen Candles Down the Drain Spectacles, Vestager, EC Posts, Meta Letters, PayPal
How Greedy Miners Are Breaking DAG Blockchains
Thursday, September 19, 2024
Top Tech Content sent at Noon! A dev conference with discussions, workshops, and 1:1 feedback sessions Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today
Issue 332 - Camouflaged Tesla Robotaxi prototype sighted
Thursday, September 19, 2024
View this email in your browser If you are just now finding out about Tesletter, you can subscribe here! If you already know Tesletter and want to support us, check out our Patreon page Issue 332 -
Programmer Weekly - Issue 223
Thursday, September 19, 2024
View this email in your browser Programmer Weekly Welcome to issue 223 of Programmer Weekly. Let's get straight to the links this week. Quote of the Week "It's tempting to write a long
Data Science Weekly - Issue 565
Thursday, September 19, 2024
Curated news, articles and jobs related to Data Science, AI, & Machine Learning ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Contrarian Report Shades GitHub Copilot Productivity, Bits Polished as .NET 9 Nears, Python in VS Code, More
Thursday, September 19, 2024
Home | News | How To | Webcasts | Whitepapers | Advertise .NET Insight September 19, 2024 THIS ISSUE SPONSORED BY: ■ dtSearch® - INSTANTLY SEARCH TERABYTES ■ Live! 360: Developer / IT / Security / Data
Web Tools #583 - No Code Maps, React, Testing, Git/CLI
Thursday, September 19, 2024
WEB VERSION Issue #583 • September 19, 2024 The following is a paid product review for No Code Map App, a platform for building custom interactive maps from almost any data source, no coding required.
Python Weekly - Issue 668
Thursday, September 19, 2024
View this email in your browser Python Weekly Welcome to issue 668 of Python Weekly. Let's get straight to the links this week. From Our Sponsor Get Your Weekly Dose of Programming A weekly