Cerebras Inference and the Challenges of Challenging NVIDIA’s Dominance
Was this email forwarded to you? Sign up here Cerebras Inference and the Challenges of Challenging NVIDIA’s DominanceWhy does NVIDIA remains virtually unchallenged in the AI chip market?Next Week in The Sequence:
You can subscribe to The Sequence below:📝 Editorial: Cerebras Inference and the Challenges of Challenging NVIDIA’s DominanceAI hardware is experiencing an innovation renaissance, with well-funded startups emerging everywhere. Yet, NVIDIA remains virtually unchallenged, holding close to a 90% share of the AI chip market. Why is that? We've all heard explanations about the advantages of NVIDIA’s software stack for acceleration compared to platforms like AMD’s, which seems like a lazy explanation for why NVIDIA is out-innovating its competitors. A simple theory that I’ve discussed with several scientists and engineers who pretrain large foundation models is that NVIDIA is the only platform receiving regular feedback about the performance of chips during pretraining runs with tens of thousands of GPUs. It turns out that at that scale, many challenges arise that are nearly impossible to simulate on a smaller scale. I will elaborate more on that theory in a future post, but the main point is that there is a very high barrier to entry when it comes to challenging NVIDIA chips for pretraining. The only viable candidate seems to be Google TPUs, which have definitely been tested at massive scale. If pretraining is out of the equation, the obvious area to explore is inference. Here, we have a completely different playing field, where performance optimizations can be applied at a smaller scale, making it more conducive to startup disruptions. One of the viable challengers to NVIDIA’s dominance in AI inference is Cerebras. Just last week, the well-funded startup unveiled Cerebras Inference, a solution capable of delivering Llama 3.1 8B at 450 tokens per second for Llama 3.1 70B. This is approximately 20x faster than NVIDIA GPUs and about 2.4x faster than Groq. The magic behind Cerebras' performance is its AI chip design, which allows the entire model to be stored on-chip, eliminating the need for GPU communication. Cerebras Inference looks impressive from top to bottom and clearly showcases the massive potential for innovation in AI inference. Competing with NVIDIA will require more than just faster chips, but Cerebras appears to be a legitimate challenger. 🔎 ML ResearchThe Mamba in the LlamaResearchers from Princeton University, Together AI, Cornell University and other academic institutions published a paper proposing a technique to distill and accelerate transformer-SSM models. The method distills transformers into RNN-equivalents with a quarter of the hidden layers —> Read more. Diffusion Models as Real Time Game EnginesGoogle Research published a paper presenting GameNGen, a game engine powered by diffusion models and interactions with real environments over long trajectories. GameNGen can simulate a DOOM game in over 20 frames in a single TPU —> Read more. LLMs that Learn from MistakesResearchers from Meta FAIR and Carnegie Mellon University published a paper outlining a technique to include error-correction data directly in the pretraining stage in order to improve reasoning capabilities. The resulting model outperform alternatives trained in error-free data —> Read more. Table Augmented GenerationIn a new paper, Researchers from UC Berkeley proposed table augmented generation, a method that addresses some of the limitations of text-to-SQL and RAG and answer questions in relational databases. The TAG model captures a very complete sets of interaction between an LLM and a database —> Read more. DisTrONous Research published a paper introducing DisTrO, an architecture that reduces inter-GPU communication by up to 5 orders of magnitude. DisTrO is an important method for low latency training of large neural networks —> Read more. Brain Inspired DesignMicrosoft Research published a summary of their recent research in three projects that simualte the brain learns. One project simulates the brain computes information, another enhances accuracy and efficiency and the third one shows improves proficiency in language processing and pattern recognition —> Read more. 🤖 AI Tech ReleasesQwen2-VLAlibaba Research released a new version of Qwen2-VL their marquee vision language model —> Read more. Cerebras InferenceCerebras released an impressive inference solution that can generate 1800 tokens per second in Llama 3.1 models —> Read more. NVIDIA NIM BlueprintsNVIDIA released NIM Blueprints, a series of templates to help enterprises get started with generative AI applications —> Read more. Gemini ModelsGoogle DeepMind released a new series of experimental models —> Read more. Command RCohere released a new version of Command R with improvements in coding, math, reasoning and latency —> Read more. 🛠 Real World AIRecommendations at NetflixNetflix discusses some of the AI techniques to enhances long term satisfaction in their content recommendations —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📝 Guest Post: Will Retrieval Augmented Generation (RAG) Be Killed by Long-Context LLMs?*
Friday, August 30, 2024
Pursuing innovation and supremacy in AI shows no signs of slowing down. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 426: Reviewing Google DeepMind’s New Tools for AI Interpretability and Guardrailing
Thursday, August 29, 2024
Gemma Scope and ShieldGemma are some of the latest additions to DeepMind's Gemma stack ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 425: Inside Mamba, the Most Famous SSM Model
Tuesday, August 27, 2024
In this issue: ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Black Forest Labs
Sunday, August 25, 2024
The startup powering image generation for xAI's Grok. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 424: How DeepMind's AlphaProof and AlphaGeometry-2 Achieved Silver Medal Status in the International Math Oly…
Thursday, August 22, 2024
One model focuses on algebra and number theory, while the other mastered geometry. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
🔒 The Vault Newsletter: November issue 🔑
Monday, November 25, 2024
Get the latest business security news, updates, and advice from 1Password. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🧐 The Most Interesting Phones You Didn't See in 2024 — Making Reddit Faster on Older Devices
Monday, November 25, 2024
Also: Best Black Friday Deals So Far, and More! How-To Geek Logo November 25, 2024 Did You Know If you look closely over John Lennon's shoulder on the iconic cover of The Beatles Abbey Road album,
JSK Daily for Nov 25, 2024
Monday, November 25, 2024
JSK Daily for Nov 25, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
Ranked | How Americans Rate Business Figures 📊
Monday, November 25, 2024
This graphic visualizes the results of a YouGov survey that asks Americans for their opinions on various business figures. View Online | Subscribe Presented by: Non-consensus strategies that go where
Spyglass Dispatch: Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad
Monday, November 25, 2024
Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad The
Daily Coding Problem: Problem #1619 [Hard]
Monday, November 25, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given two non-empty binary trees s and t , check whether tree t has exactly the
Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow
Monday, November 25, 2024
Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the
What Investors Want From AI Startups in 2025
Monday, November 25, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon
GCP Newsletter #426
Monday, November 25, 2024
Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's
⏳ 36 Hours Left: Help Get "The Art of Data" Across the Finish Line 🏁
Monday, November 25, 2024
Visual Capitalist plans to unveal its secrets behind data storytelling, but only if the book hits its minimum funding goal. View Online | Subscribe | Download Our App We Need Your Help Only 36 Hours