TheSequence - Don't Overlook China's Open Source LLMs
Was this email forwarded to you? Sign up here Next Week in The Sequence:
You can subscribe below!📝 Editorial: Don't Overlook China's Open Source LLMsIf you visit the open LLM leaderboard today, you might encounter an unfamiliar model at the top of the charts: Smaug-72B. Open-sourced by Abacus AI, this model is a fine-tuned version of another model, Qwen-72B, which Alibaba released a few months ago. The Qwen family of open-source LLMs has scored incredibly high across some of the top open-source benchmarks, showcasing the latest examples of Chinese innovation in the open-source generative AI space. While open-source LLMs are typically associated with Western models like LLaMA or Mistral, the pace of high-quality releases from China is nothing short of remarkable. Here are a few examples:
Smaug was technically developed by an American company but as a fine-tuned version of a Chinese model. From what I can tell, most open-source Chinese LLMs share strong architectural commonalities with models like Llama or Mistral; however, there hasn't been any major innovation from an architectural standpoint. Nonetheless, the quality is undeniable. While many skeptics of open-source generative AI regularly cited China as a major concern, they fail to recognize the contributions that Chinese research labs and startups will make to the space. It would be interesting to see how regulation plays a role in the evolution of open-source LLMs in China and Western countries. For now, don't overlook the Chinese open-source LLMs. They are very impressive. 🎥 Watch Now: Building Plaid’s ML Fraud Detection ApplicationWant to learn about Plaid’s ML platform journey? In this on-demand recording, Plaid Software Engineer Renault Young shared the technical challenges they faced, how they set up the data foundations they needed to start building an ML platform, what they used to look for patterns in transaction data in real time, and more. Today, Signal is Plaid’s biggest ML application and analyzes 1000+ risk factors per ACH transaction. The on-demand recording is now available for you to watch and share with your colleagues! 🔎 ML ResearchSpecialized SLMsApple Research published a paper evaluating small language model architectures based on inference, specialization and training budgets. The paper evaluates different architectures such as hyper-networks or mixture of experts to achieve different levels of specializations based on budget constraints —> Read more. Chain-of-AbstractionMeta AI Research published a paper detailing Chain of Abstraction(CoA), a method that combines reasoning and tool learning in LLMs. CoA creates abstract placeholders in reasoning chains and then fills htem with specific knowledge using tools —> Read more. Mastering Chess Without SearchResearchers from Google DeepMind published a paper proposing a 270 million parameter transformer model that was able to play chess at a grandmaster level. The model challenges traditional approaches to chess that relied on massive game datasets and complex heuristics —> Read more. Self-DiscoverGoogle DeepMind published a paper introducing Self-Discover, a framework to tackle complex reasoning problems with LLMs. The framework includes reasoning modules such as critical and step-by-step thinking as well as the building blocks to compose those modules into sophisticated reasoning chains —> Read more. AI Controller InterfaceMicrosoft Research released a prototype of AI Controller Interface (AICI), a framework to implement controllers that constraint the outputs of LLMs. AICI’s architecture allows the implementation of custom logic blocks the during the token decoding process and still maintaining the state of the LLM —> Read more. 🤖 Cool AI Tech ReleasesSmaug-72BAbacus AI released Smaug-72B which sits at the top of the open LLM leaderboard —> Read more. Gemini AdvancedGoogle rebranded Bard as Gemini and introduced Gemini Advanced with native integration for Google Docs and Gmail —> Read more. TensorFlow GNNGoogle released TensorFlow GNN, a new framework for graph neural networks in TensorFlow —> Read more. Imagen 2Google released Imagen 2, its powerful text-to-image model, across several of its AI products —> Read more. SVD 1.1Stability AI announced the release of SVD 1.1, a new version of its video generation model optimized for consistency —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
💡WEBINAR: Beyond fine-tuning. Approaches in LLM optimization
Friday, February 9, 2024
We've talked about tuning, and we've talked about prompt engineering, but those are not the only techniques at our disposal to optimize LLMs. Join us for the next webinar of our LLM series on 📅
Edge 368: Inside MemGPT: A Framework for Building Autonomous Agents You Should Know About
Thursday, February 8, 2024
Built by AI researchers from UC Berkeley and inspired by operating systems architectures, MEMGPT enables the core building blocks for agent-based applications.
Edge 367: Understanding Multi-Chain Reasoning in LLMs
Tuesday, February 6, 2024
One of the most interesting techniques used for more complex reasoning in LLMs.
🔥Building Plaid’s ML Fraud Detection Application—an apply() Fireside Chat
Monday, February 5, 2024
Want to know how Plaid, a leading fintech company, built the ML infrastructure that powers Signal, its payment fraud detection and prevention application? Then watch this virtual fireside chat on
The Most Open Open Source Generative AI Release
Sunday, February 4, 2024
AllenAI just released all the components of its OLMo LLM model.
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your