TheSequence - Some Cool Details About Llama 3
Was this email forwarded to you? Sign up here Some Cool Details About Llama 3Solid performance, new tokenizer, fairly optimal training and other details about Meta AI's new model.Next Week in The Sequence:
You can subscribed to The Sequence below:📝 Editorial: Some Cool Details About Llama 3I had an editorial prepared for this week’s newsletter, but then Meta AI released Llama 3! Such are the times we live in. Generative AI is evolving on a weekly basis, and Llama 3 is one of the most anticipated releases of the past few months. Since the debut of the original version, Llama has become one of the foundational blocks of the open source generative AI space. I prefer to use the term "open models," given that these releases are not completely open source, but that’s just my preference. The release of Llama 3 builds on incredible momentum within the open model ecosystem and brings its own innovations. The 8B and 70B versions of Llama 3 are available, with a 400B version currently being trained. The Llama 3 architecture is based on a decoder-only model and includes a new, highly optimized 128k tokenizer. This is quite notable, given that, with few exceptions, most large language models simply reuse the same tokenizers. The new tokenizer leads to major performance gains. Another area of improvement in the architecture is the grouped query attention, which was already used in Llama 2 but has been enhanced for the larger models. Grouped query attention helps improve inference performance by caching key parameters. Additionally, the context window has also increased. Training is one area in which Llama 3 drastically improves over its predecessors. The model was trained on 15 trillion tokens, making the corpus quite large for an 8B parameter model, which speaks to the level of optimization Meta achieved in this release. It's interesting to note that only 5% of the training corpus consisted of non-English tokens. The training infrastructure utilized 16,000 GPUs, achieving a throughput of 400 TFLOPs, which is nothing short of monumental. Llama 3 is a very welcome addition to the open model generative AI stack. The initial benchmark results are quite impressive, and the 400B version could rival GPT-4. Distribution is one area where Meta excelled in this release, making Llama 3 available on all major machine learning platforms. It's been just a few hours, and we are already seeing open source innovations using Llama 3. The momentum in the generative AI open models space definitely continues, even if it forced me to rewrite the entire editorial. 😊 🔎 ML ResearchVASA-1Microsoft Research published a paper detailing VASA-1, a framework for generating talking faces from static images and audio clips. The model is able to generage facial gestures such as head or lip movements in a very expressive way —> Read more. ZambaZyphra published a paper introducing Zamba, a 7B SSM model. Zamba introduces a new architecture that combines Mamba blocks with attention layers which leads to high performance in training and inference with lower computational resources —> Read more. MEGALODONAI researchers from Meta and Carnegie Mellon University published a paper introducing MEGALODON, a new architecture that can scale to virutally unlimited context windows. As it names indicates, MEGALODON is based on the MEGA architecture with an improved gated attention mechanism —> Read more. SAMMOMicrosoft Research published a paper detailing Structure-Aware Multi-objective Metaprompt Optimization (SAMMO), a framework for prompt optimization. The framework is able to optimize prompts for scenarios such as RAG or instruction tuning —> Read more. Infini-AttentionGoogle Research published a paper introducing Infini-Attention, a method to scale the context window in transformer architectures to virtually unlimited levels. The method adds a compressive memory into the attention layer which allow to build long-term and masked-local attention into a single transformer block —> Read more. AI Agents EthicsGoogle DeepMind published a paper discussing ethical considerations in AI assistants. The paper cover aspects such as safety alingment, safety and misuse —> Read more. 🤖 Cool AI Tech ReleasesLlama 3Meta AI introduced the highly anticipated Llama 3 model —> Read more. Stable Diffusion 3Stability AI launched the APIs for Stable Diffusion 3 as part of its developer platform —> Read more. Reka CoreReka, an AI startup built by former DeepMind engineers, announced its Reka Core multimodal models —> Read more. OpenEQAMeta AI released OpenEQA, a benchmark for visual language model in physical environments —> Read more. Gemini CookbookGoogle open sourced the Gemini Cookbook, a series of examples for interacting with the Gemini API —> Read more. 🛠 Real World MLAI Privacy at SlackSlack discusses the architecture enabling privacy capabilities in its AI platform —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 388: Google DeepMind's SIMA can Follow Language Instructions in 3D Games Just Like Humans
Thursday, April 18, 2024
The AI agent represents a major improvement relative to expensive reinforcement learning methods. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 387: Tool Learning in Autonomous Agents
Tuesday, April 16, 2024
Agents that master tools and APIs, UC Berkeley's Gorilla and Microsoft's TaskWeaver ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Neuro-Symbolic Models are Making a Comeback
Sunday, April 14, 2024
A new startup called Symbolica comes out of stealth with a very different value proposition. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 386: Inside Yi, 01's Model Leading the Chinese LLM Movement
Saturday, April 13, 2024
Yi has achieved remarkable performance across language and image tasks. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 385: The Two Big Schools for Building Autonomous Agents
Tuesday, April 9, 2024
Language or computer-vision based agents? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your