Was this email forwarded to you? Sign up here

👨🏼‍🎓👩🏽‍🎓 The Standard for Scalable Deep Learning Models

Weekly news digest curated by the industry insiders

Nov 28

📝 Editorial

Large deep learning models seem to be the norm these days. While deep neural networks with trillions of parameters are very attractive, they are nothing short of a nightmare to train. In most training techniques, the computational cost scales linearly with the number of parameters, resulting in impractical costs for most scenarios. In recent years, a mixture of experts (MoE) has emerged as a powerful alternative. Conceptually, MoE operates by partitioning a task into subtasks and aggregating the output. When applied to deep learning models, MoE has proven to scale sublinear with respect to the number of parameters, making the only viable option to scaling deep learning models to trillions of parameters.

The value proposition of MoE has sparked the creation of new frameworks for supporting this technique. Facebook AI Research (FAIR) recently launched fairseq for using MoE in language models. Similarly, researchers from the famous Beijing Academy of Artificial Intelligence (BAAI) open-sourced FastMoE, an implementation of MoE in PyTorch. A few days ago, Microsoft Research jumped into the MoE contributions space with the release of Tutel, an open-source library to use MoE to enable the implementation of super large deep neural networks. One of the best things about Tutel is that Microsoft didn’t only focus on the open-source release but also deeply optimized the framework for GPUs supported in the Azure platform streamlining the adoption of this MoE implementation. Little by little, MoE is becoming the gold standard of large deep learning models.

Share

🍂🍁 TheSequence Scope is our Sunday free digest. To receive high-quality educational content about the most relevant concepts, research papers and developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🍂🍁

🗓 Next week in TheSequence Edge:

Edge#145: we discuss model observability and its difference from model monitoring; we explore MLTrace, a reference architecture for observability in ML pipelines; we overview Arize AI that enables the foundation for ML observability.

Edge#146: we deep dive into Arize AI ML observability platform.

Now, let’s review the most important developments in the AI industry this week

🔎 ML Research

Deep Learning Demystified

The team from Walmart Labs published a remarkable blog post explaining the mathematical and computer science foundations of deep learning →read more on Walmart Global Tech blog

Predictive Text Selection and Federated Learning

Google Research published a blog post detailing how they used federated learning to improve the Smart Text Selection feature in Android →read more on Google Research blog

Safety Envelopes in Robotic Interactions

Carnegie Mellon University published a paper detailing a probabilistic technique for inferring surfaces that guarantee the safety of robots while interacting with objects in an environment →read more on Carnegie Melon University blog

🤖 Cool AI Tech Releases

Tutel

Microsoft Research open-sourced Tutel, a high-performance mixture of experts (MoE) library to train massively large deep learning models →read more on Microsoft Research blog

GauGAN2

NVIDIA released a demo showcasing its GauGAN2 model that can generate images from textual input →read more on NVIDIA blog

💸 Money in AI

For ML&AI:

Open-source neural search company Jina.ai raised a $30 million Series A funding round led by Canaan Partners. Hiring in Berlin/Germany, Beijing and Shenzhen/China.

AI-powered

AI-powered transcription company Verbit raised $250 million in a Series E round led by Third Point Ventures. Hiring in Tel Aviv/Israel, New York/US, Kyiv, Ukraine.
Intelligent computing platform for digital R&D Rescale raised $105 million in an expanded series C funding round. Hiring globally.
AI-powered e-commerce fulfillment startup Deliverr raised $250 million in a Series E funding round led by Tiger Global. Hiring mostly remote.
Medical platform for AI and Visualization LifeVoxel raised $5 million in a seed round. Hiring in the US and Canada.

IPO

China’s AI giant SenseTime received regulatory approval for Hong Kong IPO.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

👨🏼‍🎓👩🏽‍🎓 The Standard for Scalable Deep Learning Models

👨🏼‍🎓👩🏽‍🎓 The Standard for Scalable Deep Learning Models

Weekly news digest curated by the industry insiders

📝 Editorial

🔎 ML Research

🤖 Cool AI Tech Releases

💸 Money in AI

Older messages

🙌 Subscribe to TheSequence with 30% OFF

▪️▫️▪️▫️ Edge#144: How Many AI Neurons Does It Take to Simulate a Brain Neuron?

🏗 Edge#143: Feature Stores in ML Pipelines: A Recap

♟♟ Chess Learning Explainability

🤖Edge#142: How Microsoft Built a 530 Billion Parameter Model

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR