The Sequence Radar #501: DeepSeek 5 New Open Source Releases
Was this email forwarded to you? Sign up here The Sequence Radar #501: DeepSeek 5 New Open Source ReleasesSome of the techniques used in R1 are now open source.Next Week in The Sequence:Our series about RAG continues with an exploration of hypothetical document embeddings. We discuss a new agentic framework that was just released in our engineering edition. The research edition dives into DeepMind’s amazing AlphaGeometry2. Our opinion day is going to explore a fascinating topic: do we need new programming languages for AI? You can subscribe to The Sequence below:📝 Editorial: DeepSeek 5 New Open Source ReleasesIn a week dominated by OpenAI and Anthropic unveiling new models, let’s shift our focus to something different. Do you really need another newsletter dissecting GPT-4.5? What flew under the radar this week was DeepSeek’s impressive series of five open-source releases. These contributions focus on optimizations derived from their flagship R1 model, showcasing just how technically formidable this team is when it comes to AI efficiency. Let’s break them down:
These open-source contributions underline DeepSeek’s commitment to fostering an open and collaborative AI ecosystem. The impact has been immediate—FlashMLA, for instance, amassed over 5,000 stars on GitHub within just six hours of its release. While the industry’s attention was fixed on proprietary advancements, DeepSeek made a powerful statement about the role of open-source innovation in AI’s future. 📶AI Eval of the WeeekA few months ago, I co-founded LayerLens( still in stealth mode but follow us on X to stay tuned) to streamline the benchmarking and evaluation of foundation models. I can’t tell you how much I am learning about these models by regularly running evaluations so I decided I wanted to share some of those learnings. Have you heard about Humanity’s Last Exam? This is one of the toughest benchmarks ever created with contributions of over 1000 domain experts. How difficult it is exactly? Well look at the performance of some of DeepSeek, OpenAI, Google and Anthropic models all scoring less than 5%. 🔎 AI ResearchCodeCriticBenchIn the paper CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models, researchers from Alibaba and other AI labs introduce CodeCriticBench, a benchmark for evaluating the code critique capabilities of Large Language Models (LLMs). It includes code generation and code QA tasks with basic and advanced critique evaluations. SWE-RLIn the paper SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution, researchers from Meta FAIR introduce SWE-RL, a reinforcement learning (RL) method to improve LLMs on software engineering (SE) tasks using software evolution data and rule-based rewards. The resulting model, Llama3-SWE-RL-70B, achieves a 41.0% solve rate on SWE-bench Verified. BigBench Extra HardBIG-Bench Extra Hard (BBEH): In the paper BIG-Bench Extra Hard, researchers from Google DeepMind introduce BBEH, a benchmark designed to assess advanced reasoning capabilities of large language models (LLMs). BBEH builds upon the BIG-Bench Hard (BBH) benchmark by replacing each of the 23 tasks with a novel, more difficult counterpart. Deep Research Tech ReportIn the Deep Research System Card, OpenAI introduces deep research, a new agentic capability that conducts multi-step research on the internet for complex tasks. It leverages reasoning to search, interpret, and analyze text, images, and PDFs, and can also read user-provided files and analyze data using Python code. Phi-4-Mini Technical ReportIn the Phi-4-Mini Technical Report, Microsoft introduces Phi-4-Mini and Phi-4-Multimodal, compact yet capable language and multimodal models. Phi-4-Mini is a 3.8-billion-parameter language model, and Phi-4-Multimodal integrates text, vision, and speech/audio input modalities into a single model using a mixture-of-LoRAs technique. MagmaIn the paper Magma: A Foundation Model for Multimodal AI Agents, Microsoft Research presents Magma, a multimodal AI model that understands and acts on inputs to complete tasks in digital and physical environments. Magma uses Set-of-Mark and Trace-of-Mark techniques during pretraining to enhance spatial-temporal reasoning, enabling strong performance in UI navigation and robotic manipulation tasks. 🤖 AI Tech ReleasesDeepSeek Open Source WeekDeepSeek did 5 open source releases this week. GPT-4.5OpenAI released a preview of GPT-4.5 with new capabiltiies a fairly high API price. Claude 3.7 SonnetAnthropic released a new version of its Sonnet model. Granite 3.2IBM open sourced the new version of its Granite models that include reaoning, time series forecasting and vision. OctoToolsStanford University open sourced OctoTools, a new agentic framework optimized for reasoning and tool usage. Qodo EmbedQodo-Embed-1-1.5B is a new 1.5 billion parameter code embedding model that matches OpenAI’s performance. 🛠 Real World AINew AlexaAmazon shared some details about how they built the new version of Alexa. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
The Sequence Research #500: Making Small Models Great Achieve GPT-o1 Levels in Math Reasoning with Microsoft rStar…
Friday, February 28, 2025
The new method represents an important evolution of reasoning for SLMs. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Guest-post: Open-source Python Development Landscape
Thursday, February 27, 2025
30 must-know tools for Python development ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Opinion #499: Reinforcement Learning was Dying and then Gen AI Came Along
Thursday, February 27, 2025
Some perspectives about how foundation models inspired a new era in reinforcement learning. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Knowledge #492: RAG-Fusion is Better than Just RAG
Thursday, February 27, 2025
Understanding the principles of RAG-fusion techniques. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Sequence Engineering #493: One of the Best Agent Frameworks in the Market Just Got Way Better
Thursday, February 27, 2025
The new version adds a considerable set of capabilities for a more integrated agent development experience. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
So you want to break down monolith? Read that first.
Monday, March 3, 2025
My lessons learned, dos and donts from breaking down monoliths. I gathered my experience on what to do before even starting. I explained hy defining real business metrics is critical and why you should
📧 Get Pragmatic REST APIs for 30% OFF (limited offer)
Monday, March 3, 2025
Hey, it's Milan. More than 400+ students are already deep into the PRA lessons and they're finding it the "best REST APIs course" they've ever seen. So I want to share this
SRE Weekly Issue #466
Monday, March 3, 2025
View on sreweekly.com A bit of a short issue this week, as I spent most of my weekend at my child's first First Robotics Competition of the season. FRC truly is a microcosm of reliability
WP Weekly 232 - Energy - Faster Woo, Patterns in Folders, $800K Yearly
Monday, March 3, 2025
Read on Website WP Weekly 232 / Energy The WordPress energy was high at the recently concluded WordCamp Asia 2025. In this issue, check new plugin launches like Role Editor, Frontis Blocks, and
Last Chance to Register for ElasticON Singapore – Don’t Miss Out!
Monday, March 3, 2025
Join us tomorrow for Elastic insights, top speakers, and more!ㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤ
Spring Bean Scopes for Dependency Injection
Monday, March 3, 2025
Since the Spring Container is responsible for the object lifetime management of Spring Beans, it is important to know how the Spring Container determines how bean objects are shared and disposed ͏ ͏ ͏
Claude 3.7 Sonnet and GPT-4.5 - Sync #508
Sunday, March 2, 2025
Plus: Plus: Alexa+; Google AI co-scientist; humanoid robots for home from Figure and 1X; miracle HIV medicine; a startup making glowing rabbits; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
C#546 Finalizers are tricker than you think
Sunday, March 2, 2025
Common pitfalls when implementing finalizers
PD#615 How Core Git Developers Configure Git
Sunday, March 2, 2025
What git config settings should be defaults by now? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Daily Coding Problem: Problem #1706 [Medium]
Sunday, March 2, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Given an unsorted array of integers, find the length of the longest