͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Was this email forwarded to you? Sign up here

Cerebras Inference and the Challenges of Challenging NVIDIA’s Dominance

Why does NVIDIA remains virtually unchallenged in the AI chip market?

Sep 1

READ IN APP

Next Week in The Sequence:

Edge 427: Our series about state space models(SSM) continues with a review of AI21’s Jamba, a model that combines transformers and SSMs. We discuss Jamba’s original research paper and the DeepEval framework.
Edge 428: We dive into PromptPoet, Character.ai’s framework for prompt optimization.

You can subscribe to The Sequence below:

📝 Editorial: Cerebras Inference and the Challenges of Challenging NVIDIA’s Dominance

AI hardware is experiencing an innovation renaissance, with well-funded startups emerging everywhere. Yet, NVIDIA remains virtually unchallenged, holding close to a 90% share of the AI chip market. Why is that?

We've all heard explanations about the advantages of NVIDIA’s software stack for acceleration compared to platforms like AMD’s, which seems like a lazy explanation for why NVIDIA is out-innovating its competitors. A simple theory that I’ve discussed with several scientists and engineers who pretrain large foundation models is that NVIDIA is the only platform receiving regular feedback about the performance of chips during pretraining runs with tens of thousands of GPUs. It turns out that at that scale, many challenges arise that are nearly impossible to simulate on a smaller scale. I will elaborate more on that theory in a future post, but the main point is that there is a very high barrier to entry when it comes to challenging NVIDIA chips for pretraining. The only viable candidate seems to be Google TPUs, which have definitely been tested at massive scale.

If pretraining is out of the equation, the obvious area to explore is inference. Here, we have a completely different playing field, where performance optimizations can be applied at a smaller scale, making it more conducive to startup disruptions.

One of the viable challengers to NVIDIA’s dominance in AI inference is Cerebras. Just last week, the well-funded startup unveiled Cerebras Inference, a solution capable of delivering Llama 3.1 8B at 450 tokens per second for Llama 3.1 70B. This is approximately 20x faster than NVIDIA GPUs and about 2.4x faster than Groq. The magic behind Cerebras' performance is its AI chip design, which allows the entire model to be stored on-chip, eliminating the need for GPU communication.

Cerebras Inference looks impressive from top to bottom and clearly showcases the massive potential for innovation in AI inference. Competing with NVIDIA will require more than just faster chips, but Cerebras appears to be a legitimate challenger.

🔎 ML Research

The Mamba in the Llama

Researchers from Princeton University, Together AI, Cornell University and other academic institutions published a paper proposing a technique to distill and accelerate transformer-SSM models. The method distills transformers into RNN-equivalents with a quarter of the hidden layers —> Read more.

Diffusion Models as Real Time Game Engines

Google Research published a paper presenting GameNGen, a game engine powered by diffusion models and interactions with real environments over long trajectories. GameNGen can simulate a DOOM game in over 20 frames in a single TPU —> Read more.

LLMs that Learn from Mistakes

Researchers from Meta FAIR and Carnegie Mellon University published a paper outlining a technique to include error-correction data directly in the pretraining stage in order to improve reasoning capabilities. The resulting model outperform alternatives trained in error-free data —> Read more.

Table Augmented Generation

In a new paper, Researchers from UC Berkeley proposed table augmented generation, a method that addresses some of the limitations of text-to-SQL and RAG and answer questions in relational databases. The TAG model captures a very complete sets of interaction between an LLM and a database —> Read more.

DisTrO

Nous Research published a paper introducing DisTrO, an architecture that reduces inter-GPU communication by up to 5 orders of magnitude. DisTrO is an important method for low latency training of large neural networks —> Read more.

Brain Inspired Design

Microsoft Research published a summary of their recent research in three projects that simualte the brain learns. One project simulates the brain computes information, another enhances accuracy and efficiency and the third one shows improves proficiency in language processing and pattern recognition —> Read more.

🤖 AI Tech Releases

Qwen2-VL

Alibaba Research released a new version of Qwen2-VL their marquee vision language model —> Read more.

Cerebras Inference

Cerebras released an impressive inference solution that can generate 1800 tokens per second in Llama 3.1 models —> Read more.

NVIDIA NIM Blueprints

NVIDIA released NIM Blueprints, a series of templates to help enterprises get started with generative AI applications —> Read more.

Gemini Models

Google DeepMind released a new series of experimental models —> Read more.

Command R

Cohere released a new version of Command R with improvements in coding, math, reasoning and latency —> Read more.

🛠 Real World AI

Recommendations at Netflix

Netflix discusses some of the AI techniques to enhances long term satisfaction in their content recommendations —> Read more.

📡AI Radar

AI coding platform Magic raised an impressive $320 million round.
Another AI coding platform Codium raised $150 million in a Series C.
Amazon hired the founders of Robotis startup Covariant.
Midjourney announced their intentions of going into hardware.
OpenAI is closing its tender offer at $100 billion valuation.
OpenAI and Anthropic agreed the U.S AI Institute to review and evaluate their models.
AI customer service platform Bland AI raised $16 million in new funding.
Inflection AI is porting its Pi chatbot to enterprise workflows.
Antrophic made its Artifacts solution available in IOS and Android.
Asset manager Magnetar Capital is launching its first venture fund focused on generative AI.
Atlassian acquired AI meeting bot company Rewatch.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Like

Comment

Restack

Cerebras Inference and the Challenges of Challenging NVIDIA’s Dominance

Cerebras Inference and the Challenges of Challenging NVIDIA’s Dominance

Why does NVIDIA remains virtually unchallenged in the AI chip market?

Next Week in The Sequence:

You can subscribe to The Sequence below:

📝 Editorial: Cerebras Inference and the Challenges of Challenging NVIDIA’s Dominance

🔎 ML Research

The Mamba in the Llama

Diffusion Models as Real Time Game Engines

LLMs that Learn from Mistakes

Table Augmented Generation

DisTrO

Brain Inspired Design

🤖 AI Tech Releases

Qwen2-VL

Cerebras Inference

NVIDIA NIM Blueprints

Gemini Models

Command R

🛠 Real World AI

Recommendations at Netflix

📡AI Radar

Older messages

📝 Guest Post: Will Retrieval Augmented Generation (RAG) Be Killed by Long-Context LLMs?*

Edge 426: Reviewing Google DeepMind’s New Tools for AI Interpretability and Guardrailing

Edge 425: Inside Mamba, the Most Famous SSM Model

Black Forest Labs

Edge 424: How DeepMind's AlphaProof and AlphaGeometry-2 Achieved Silver Medal Status in the International Math Oly…

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR