The Sequence Radar #501: DeepSeek 5 New Open Source Releases
Was this email forwarded to you? Sign up here The Sequence Radar #501: DeepSeek 5 New Open Source ReleasesSome of the techniques used in R1 are now open source.Next Week in The Sequence:Our series about RAG continues with an exploration of hypothetical document embeddings. We discuss a new agentic framework that was just released in our engineering edition. The research edition dives into DeepMind’s amazing AlphaGeometry2. Our opinion day is going to explore a fascinating topic: do we need new programming languages for AI? You can subscribe to The Sequence below:📝 Editorial: DeepSeek 5 New Open Source ReleasesIn a week dominated by OpenAI and Anthropic unveiling new models, let’s shift our focus to something different. Do you really need another newsletter dissecting GPT-4.5? What flew under the radar this week was DeepSeek’s impressive series of five open-source releases. These contributions focus on optimizations derived from their flagship R1 model, showcasing just how technically formidable this team is when it comes to AI efficiency. Let’s break them down:
These open-source contributions underline DeepSeek’s commitment to fostering an open and collaborative AI ecosystem. The impact has been immediate—FlashMLA, for instance, amassed over 5,000 stars on GitHub within just six hours of its release. While the industry’s attention was fixed on proprietary advancements, DeepSeek made a powerful statement about the role of open-source innovation in AI’s future. 📶AI Eval of the WeeekA few months ago, I co-founded LayerLens( still in stealth mode but follow us on X to stay tuned) to streamline the benchmarking and evaluation of foundation models. I can’t tell you how much I am learning about these models by regularly running evaluations so I decided I wanted to share some of those learnings. Have you heard about Humanity’s Last Exam? This is one of the toughest benchmarks ever created with contributions of over 1000 domain experts. How difficult it is exactly? Well look at the performance of some of DeepSeek, OpenAI, Google and Anthropic models all scoring less than 5%. 🔎 AI ResearchCodeCriticBenchIn the paper CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models, researchers from Alibaba and other AI labs introduce CodeCriticBench, a benchmark for evaluating the code critique capabilities of Large Language Models (LLMs). It includes code generation and code QA tasks with basic and advanced critique evaluations. SWE-RLIn the paper SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution, researchers from Meta FAIR introduce SWE-RL, a reinforcement learning (RL) method to improve LLMs on software engineering (SE) tasks using software evolution data and rule-based rewards. The resulting model, Llama3-SWE-RL-70B, achieves a 41.0% solve rate on SWE-bench Verified. BigBench Extra HardBIG-Bench Extra Hard (BBEH): In the paper BIG-Bench Extra Hard, researchers from Google DeepMind introduce BBEH, a benchmark designed to assess advanced reasoning capabilities of large language models (LLMs). BBEH builds upon the BIG-Bench Hard (BBH) benchmark by replacing each of the 23 tasks with a novel, more difficult counterpart. Deep Research Tech ReportIn the Deep Research System Card, OpenAI introduces deep research, a new agentic capability that conducts multi-step research on the internet for complex tasks. It leverages reasoning to search, interpret, and analyze text, images, and PDFs, and can also read user-provided files and analyze data using Python code. Phi-4-Mini Technical ReportIn the Phi-4-Mini Technical Report, Microsoft introduces Phi-4-Mini and Phi-4-Multimodal, compact yet capable language and multimodal models. Phi-4-Mini is a 3.8-billion-parameter language model, and Phi-4-Multimodal integrates text, vision, and speech/audio input modalities into a single model using a mixture-of-LoRAs technique. MagmaIn the paper Magma: A Foundation Model for Multimodal AI Agents, Microsoft Research presents Magma, a multimodal AI model that understands and acts on inputs to complete tasks in digital and physical environments. Magma uses Set-of-Mark and Trace-of-Mark techniques during pretraining to enhance spatial-temporal reasoning, enabling strong performance in UI navigation and robotic manipulation tasks. 🤖 AI Tech ReleasesDeepSeek Open Source WeekDeepSeek did 5 open source releases this week. GPT-4.5OpenAI released a preview of GPT-4.5 with new capabiltiies a fairly high API price. Claude 3.7 SonnetAnthropic released a new version of its Sonnet model. Granite 3.2IBM open sourced the new version of its Granite models that include reaoning, time series forecasting and vision. OctoToolsStanford University open sourced OctoTools, a new agentic framework optimized for reasoning and tool usage. Qodo EmbedQodo-Embed-1-1.5B is a new 1.5 billion parameter code embedding model that matches OpenAI’s performance. 🛠 Real World AINew AlexaAmazon shared some details about how they built the new version of Alexa. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
The Sequence Research #500: Making Small Models Great Achieve GPT-o1 Levels in Math Reasoning with Microsoft rStar…
Friday, February 28, 2025
The new method represents an important evolution of reasoning for SLMs. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Guest-post: Open-source Python Development Landscape
Thursday, February 27, 2025
30 must-know tools for Python development ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Opinion #499: Reinforcement Learning was Dying and then Gen AI Came Along
Thursday, February 27, 2025
Some perspectives about how foundation models inspired a new era in reinforcement learning. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Knowledge #492: RAG-Fusion is Better than Just RAG
Thursday, February 27, 2025
Understanding the principles of RAG-fusion techniques. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Engineering #493: One of the Best Agent Frameworks in the Market Just Got Way Better
Thursday, February 27, 2025
The new version adds a considerable set of capabilities for a more integrated agent development experience. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
JSK Daily for Mar 21, 2025
Friday, March 21, 2025
JSK Daily for Mar 21, 2025 View this email in your browser A community curated daily e-mail of JavaScript news Introducing the New React MultiColumn ComboBox The React MultiColumn ComboBox is a
Dispatch 049: March Madness
Friday, March 21, 2025
AlexNet Open Sourced • Microsoft's Inflection • Yahoo's TechCrunch • CoreWeave's Swap The Spyglass Dispatch is a newsletter featuring links and commentary on timely topics found around the
Daily Coding Problem: Problem #1724 [Medium]
Friday, March 21, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Airbnb. You are given a huge list of airline ticket prices between different cities
Ranked | The World's 50 Most Valuable Brands in 2025 💰
Friday, March 21, 2025
American tech firms largely dominate the most valuable brands, but large Chinese competitors are quickly rising up the rankings. View Online | Subscribe | Download Our App Why We're Going All-In on
iOS Dev Weekly – Issue 704
Friday, March 21, 2025
How do you think about choosing package dependencies? Do you have any hard “No”'s?
iOS Cocoa Treats
Friday, March 21, 2025
View in browser Hello, you're reading Infinum iOS Cocoa Treats, bringing you the latest iOS related news straight to your inbox every week. Rendering Pixel Art with SwiftUI The main challenge of
Issue #585: Publishing games on Steam, GIMP 3.0, and A Very Tiny Game
Friday, March 21, 2025
View this email in your browser Issue #585 - March 21st 2025 Weekly newsletter about Web Game Development. If you have anything you want to share with our community please let me know by replying to
ASP.NET Core News - 03/21/2025
Friday, March 21, 2025
View this email in your browser Get ready for this weeks best blog posts about ASP.NET Core! How to log to Azure Application Insights using ILogger in ASP.NET Core — by bellonedavide .NET 10 Preview 2
The Android for iPhone die-hards📱
Friday, March 21, 2025
AI video's secret cost; CTO vs. CMO; Amazon phone deals -- ZDNET ZDNET Tech Today - US March 21, 2025 Planck SSD I found an Android phone that can convince iPhone users to make the switch - and
⚙️ Claude's upgrade
Friday, March 21, 2025
Plus: ChatGPT tells a father he killed his sons