Forwarded this email? Subscribe here for more

Was this email forwarded to you? Sign up here

Don't Overlook China's Open Source LLMs

A version of a Chinese LLM tops the open LLM leaderboard.

Feb 11

READ IN APP

An illustrative digital artwork depicting the creation of open-source AI language models in China, seamlessly blending traditional Chinese colors, symbols, and the Chinese flag with iconic open-source symbols. The artwork is set in a digital innovation landscape, where nodes and data streams connect in a complex network. At the heart of this network, glowing AI models are encased in orbs, each orb adorned with the Chinese flag's red and gold colors and integrated with open-source symbols like the gear, key, and padlock, representing accessibility, security, and collaboration in open-source development. Traditional Chinese elements such as dragons, the Great Wall, and cherry blossoms are subtly incorporated into the data streams, symbolizing strength, protection, and renewal. This image captures the fusion of national pride and the global open-source movement, showcasing China's contribution to the field of AI through a rich visual narrative. — Created Using DALL-E

Next Week in The Sequence:

Edge 369: Our series about LLM reasoning continues with the recently published Chain-of-Code(CoC) method. We review the original CoC paper by Google DeepMind and the super popular Embedchain framework.
Edge 370: We dive the new AlphaGeometry model created by Google DeepMind that is able to solve geometry problems at the level of a math olympiad gold medalist.

You can subscribe below!

📝 Editorial: Don't Overlook China's Open Source LLMs

If you visit the open LLM leaderboard today, you might encounter an unfamiliar model at the top of the charts: Smaug-72B. Open-sourced by Abacus AI, this model is a fine-tuned version of another model, Qwen-72B, which Alibaba released a few months ago. The Qwen family of open-source LLMs has scored incredibly high across some of the top open-source benchmarks, showcasing the latest examples of Chinese innovation in the open-source generative AI space. While open-source LLMs are typically associated with Western models like LLaMA or Mistral, the pace of high-quality releases from China is nothing short of remarkable. Here are a few examples:

01.ai, Kai-Fu Lee’s AI startup, open-sourced its Yi family of models, which top several benchmarks on open-source leaderboards.
DeepSeek AI released DeepSeek Chat, a 67B parameter model trained on 2 trillion English and Chinese tokens, followed by models for coding and math.
Alibaba open-sourced several versions of its Qwen LLM models with impressive performance.
Tiger Research open-sourced Tigerbot-70B-chat, built on top of Llama 2.

Smaug was technically developed by an American company but as a fine-tuned version of a Chinese model. From what I can tell, most open-source Chinese LLMs share strong architectural commonalities with models like Llama or Mistral; however, there hasn't been any major innovation from an architectural standpoint. Nonetheless, the quality is undeniable. While many skeptics of open-source generative AI regularly cited China as a major concern, they fail to recognize the contributions that Chinese research labs and startups will make to the space. It would be interesting to see how regulation plays a role in the evolution of open-source LLMs in China and Western countries. For now, don't overlook the Chinese open-source LLMs. They are very impressive.

🎥 Watch Now: Building Plaid’s ML Fraud Detection Application

Want to learn about Plaid’s ML platform journey? In this on-demand recording, Plaid Software Engineer Renault Young shared the technical challenges they faced, how they set up the data foundations they needed to start building an ML platform, what they used to look for patterns in transaction data in real time, and more. Today, Signal is Plaid’s biggest ML application and analyzes 1000+ risk factors per ACH transaction.

The on-demand recording is now available for you to watch and share with your colleagues!

WATCH THE VIDEO

🔎 ML Research

Specialized SLMs

Apple Research published a paper evaluating small language model architectures based on inference, specialization and training budgets. The paper evaluates different architectures such as hyper-networks or mixture of experts to achieve different levels of specializations based on budget constraints —> Read more.

Chain-of-Abstraction

Meta AI Research published a paper detailing Chain of Abstraction(CoA), a method that combines reasoning and tool learning in LLMs. CoA creates abstract placeholders in reasoning chains and then fills htem with specific knowledge using tools —> Read more.

Mastering Chess Without Search

Researchers from Google DeepMind published a paper proposing a 270 million parameter transformer model that was able to play chess at a grandmaster level. The model challenges traditional approaches to chess that relied on massive game datasets and complex heuristics —> Read more.

Self-Discover

Google DeepMind published a paper introducing Self-Discover, a framework to tackle complex reasoning problems with LLMs. The framework includes reasoning modules such as critical and step-by-step thinking as well as the building blocks to compose those modules into sophisticated reasoning chains —> Read more.

AI Controller Interface

Microsoft Research released a prototype of AI Controller Interface (AICI), a framework to implement controllers that constraint the outputs of LLMs. AICI’s architecture allows the implementation of custom logic blocks the during the token decoding process and still maintaining the state of the LLM —> Read more.

🤖 Cool AI Tech Releases

Smaug-72B

Abacus AI released Smaug-72B which sits at the top of the open LLM leaderboard —> Read more.

Gemini Advanced

Google rebranded Bard as Gemini and introduced Gemini Advanced with native integration for Google Docs and Gmail —> Read more.

TensorFlow GNN

Google released TensorFlow GNN, a new framework for graph neural networks in TensorFlow —> Read more.

Imagen 2

Google released Imagen 2, its powerful text-to-image model, across several of its AI products —> Read more.

SVD 1.1

Stability AI announced the release of SVD 1.1, a new version of its video generation model optimized for consistency —> Read more.

📡AI Radar

OpenAI CEO Sam Altman is reportedly raising $7 trillion for a new AI chip project.
AI compute startup Zededa announced $72 million in new funding.
Ambience Healthcare raised $70 million for its AI operating systems for healthcare organizations.
Software asset management Xensam raised $40 million to build its next generation of AI capabilities.
AI clinical trial company Unlearn raised $50 million in a Series C funding.
Jua raised $16 million for LLMs pretrained in weather and climate data.
AI agent platform Cimba.ai announced a $1.25 million pre-seed round.
Finally raised $10 million for applying AI to small business accounting.
Entrust is acquiring AI identity verification company Onfido for a reported $400 million.
3D generative AI platform Atlas announced $4.5 million in new funding.
Programming training platform CodeSignal unveiled its AI-powered CodeSignal Learn solution.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Like

Comment

Restack

TheSequence - Don't Overlook China's Open Source LLMs

Don't Overlook China's Open Source LLMs

A version of a Chinese LLM tops the open LLM leaderboard.

Next Week in The Sequence:

You can subscribe below!

📝 Editorial: Don't Overlook China's Open Source LLMs

🎥 Watch Now: Building Plaid’s ML Fraud Detection Application

🔎 ML Research

Specialized SLMs

Chain-of-Abstraction

Mastering Chess Without Search

Self-Discover

AI Controller Interface

🤖 Cool AI Tech Releases

Smaug-72B

Gemini Advanced

TensorFlow GNN

Imagen 2

SVD 1.1

📡AI Radar

Older messages

💡WEBINAR: Beyond fine-tuning. Approaches in LLM optimization

Edge 368: Inside MemGPT: A Framework for Building Autonomous Agents You Should Know About

Edge 367: Understanding Multi-Chain Reasoning in LLMs

🔥Building Plaid’s ML Fraud Detection Application—an apply() Fireside Chat

The Most Open Open Source Generative AI Release

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR