TheSequence - You Need to Know About Groq
Was this email forwarded to you? Sign up here Next Week in The Sequence:
You can subscribe to The Sequence below:📝 Editorial: Groq’s Massive MilestoneMaking inference fast is one of the north stars for the next generation of AI infrastructure providers. The default assumption is that the AI inference market will be dominated by NVIDIA, and while that might turn out to be correct, there is innovation happening across all levels of the AI infrastructure stack. One of the most intriguing newcomers in this space is Groq, a startup that has emerged to become synonymous with fast AI inference. Groq is the maker of the Language Processing Unit (LPU), a chip optimized for fast AI inference. The Groq LPU is a single-core processor designed for LLMs, interconnected with a fast switchless routing fabric using 288 QSFP28 optical cables. A rack is built from 9 GroqNode 1 servers (with 1 server acting as a redundant resource), featuring a fully connected internal RealScale network delivering accelerated compute performance of up to 48 PetaOPs (INT8) or 12 PFLOPs (FP16). Groq Cloud is an LPU-based cloud offering that already includes many of the top generative AI models and counts over 350,000 developers. Groq is fast. In Groq Cloud, models can perform at over 500 tokens per second, with a 10x comparable improvement over GPT-4. The rise of Groq might seem sudden given the frantic pace of the AI market. However, the startup has been working on LPUs since 2016. Their perseverance has been rewarded. Last week, Groq announced a $640 million funding round led by Blackrock, along with Neuberger Berman, Type One Ventures, Cisco, KDDI, and Samsung Catalyst Fund. AI hardware is a tough market, but Groq now has the resources to innovate and compete. AI inference is still NVIDIA’s world, but Groq is a fascinating new player in it. 🔎 ML ResearchRobots for Table TennisGoogle DeepMind published a paper introducing the techniques building the first robot agent to achieved human competitive level in table tennis. The paper details different techniques such as hierarchical policy learning, sim-to-real and many others that are combined in a very clever way —> Read more. CodexGraphAliaba Research published a paper introducing CodexGraph, a system that integrates LLMs with a graph database based on code repositories. The graph model allows LLMs to navigate more sophisticated code structures adn tackle more complex tasks —> Read more. GENEVAMicrosoft Research published a paper introducing GENEVA, a tool that can generate rich narrative graphs based on a high level description and a set of constraints. GENEVA, explores different narractive paths through a visual, graph interface —> Read more. RAG FoundryResearchers from Intel published a paper detailing RAG Foundry, a framework for streamlining RAG use cases. RAG Foundry enables capabilities such as data creation, inference, evaluation in a single workflow —> Read more. LLM Scaling Without Increasing ParametersGoogle DeepMind published a paper discussing the importance of scaling test time computation in order to scale LLMs. The paper explores increasing test-time computation by searching against dense, process-based verifier reward models and updating the model distribution based on prompts at test time —> Read more. Self-Taught EvaluatorsMeta AI published a paper proposing an LLM-as-a-judge technique to improve evaluators withouht using human synthetic data. The method trains an LLM to produce reasoning traces and final judgments and repeats that process to obtain improved prediction —> Read more. 🤖 AI Tech ReleasesOpenAI API Structured OutputsOpenAI unveiled a new feature that enables structured JSON outputs as part of its API —> Read more. 🛠 Real World AIPrompt PoetCharacter.ai open sourced Prompt Poet, their framework for prompt design —> Read more. Smart Notifications at PinterestPinterest discusses the machine learning techniques used in their notification systems —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📝 Guest Post: RAG Evaluation Using Ragas*
Friday, August 9, 2024
In this guest post, the teams from Zilliz and Ragas discuss key RAG evaluation metrics, their calculation, and implementation using the Milvus vector database and the Ragas package. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 420: Inside FlashAttention-3, The Algorithm Pushing the New Wave of Transformers
Thursday, August 8, 2024
The new algorithm takes full advantage of the capabilities of H100 GPUs. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 419: Everything You Need to Know About Autonomous Agents in 19 Posts
Tuesday, August 6, 2024
A summary of our long series about automous agents. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Gemma 2: A Release That Matters
Sunday, August 4, 2024
A new model, a guardrails framework and an interpretability tool. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Gemma 2: A Release That Matters
Sunday, August 4, 2024
A new model, a guardrails framework and an interpretability tool. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your