Was this email forwarded to you? Sign up here

👷‍♀️🧑🏻‍🎓👩‍💻👨🏻‍🏫 The MoE Momentum

Weekly news digest curated by the industry insiders

Jan 23

📝 Editorial

Massively large neural networks seem to be the pattern to follow these days in the deep learning space. The size and complexity of deep learning models are reaching unimaginable levels, particularly in models that try to master multiple tasks. Such large models are not only difficult to understand but incredibly challenging to train and run without incurring significant computational expenses. In recent years, Mixture of experts (MoE) has emerged as one of the most efficient techniques to build and train large multi-task models. While MoE is not necessarily a novel ML technique, it has certainly experienced a renaissance with the rapid emergence of massively large deep learning models.

Conceptually, MoE is rooted in the simple idea of decomposing a large multi-task network into smaller expert networks that can master an individual task. This might sound similar to ensemble learning, but the big difference is that MoE models execute one expert network at any given time. The greatest benefit of MoE models is that their computation costs scale sub-linearly with respect to their size. As a result, MoE has become one of the most adopted architectures for large-scale models. Just this week, Microsoft and Google Research published papers outlining techniques to improve the scalability of MoE models. As big ML models continue to dominate the deep learning space, MoE techniques are likely to become more mainstream in real-world ML solutions.

Share

🔺🔻 TheSequence Scope is our Sunday free digest. To receive high-quality educational content about the most relevant concepts, research papers, and developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻

🗓 Next week in TheSequence Edge:

Edge#159: we recap our MLOPs series (two parts!);

Edge#160: we deep dive into Aporia, an ML Observability platform.

Now, let’s review the most important developments in the AI industry this week

🔎 ML Research

Data2vec

Meta (Facebook) AI Research (FAIR) published a paper unveiling data2vec, a self-supervised learning model that mastered speech, language, and computer vision tasks →read more on FAIR blog

MoE Task Routing

Google Research published a paper introducing TaskMoE, a technique to extract smaller, more efficient subnetworks from large multi-task models based on Mixture of experts (MoE) architectures →read more on Google Research blog

DeepSpeed and MoE

Microsoft Research published a very detailed blog post detailing how to use its DeepSpeed framework to scale the training of Mixture of experts (MoE) models →read more on Microsoft Research blog

StylEx – Visual Interpretability of Classifiers

Google Research published a paper proposing StylEx, a method to visualize the influence that individual attributes have on the output of ML classifiers →read more on Google Research blog

🤖 Cool AI Tech Releases

Macaw Demo

The Allen Institute for AI (AI2) open-sourced a demo solution that compares its Macaw model against OpenAI’s GPT-3 →read more on AI2 blog

🛠 Real World ML

AI Fairness at LinkedIn

The LinkedIn engineering team published some details about how they integrate fairness as a first-class citizen of its AI products →read more on LinkedIn Engineering blog

🐦 Useful Tweet

McKinsey & Company @McKinsey

In 2019, @quantumblack, our #AI firm, launched #Kedro, its first open-source software tool. Today, we’re taking the next step in our #opensource journey and donating Kedro to the Linux Foundation. Learn more ➡️ mck.co/3KkP1wB #McKinseyonAI #MachineLearning

Follow us on Twitter

💸 Money in AI

AI-powered

Revenue operations platform Clari raised $225 million in a Series F round of funding led by Blackstone. Hiring across the US and remote.
Agtech startup Green Labs raised a $140 million Series C led by BRV Capital Management. Hiring in South Korea.
Contextual codeless AI infrastructure Pixis raised a $100 million Series C funding round led by SoftBank Vision Fund 2. Hiring in India and the US.
Banking and financial services platform Personetics raised $85 million in growth funding from Thoma Bravo. Hiring across the globe.
Security company Ambient.ai raised $52 million in venture funding led by a16z. Hiring mostly in Palo Alto/US.
Time management and smart calendar tool Clockwise raised $45 million in Series C funding led by Coatue. Hiring in San Francisco/US and remote.
HR management platform flex raised a $32 million Series B round led by Greenoaks. Hiring in South Korea.
Support automation platform Capacity raised an additional $27 million in a Series C round led by existing investors. Hiring remote.
Revenue growth platform Proton.ai raised a $20 million Series A round led by Felicis Ventures. Hiring remote.
Logistic platform 7bridges raised $17 million in a Series A round led by Eight Roads. Hiring in London/UK.
SAAS CPG platform Turing Labs raised a $16.5 million Series A round led Insight Partners. Hiring remote.
Data science knowledge capturing and sharing solution Vectice raised a $12.6 million Series A round co-led by Sorenson Ventures and Crosslink Capital. Hiring in Nates/France and San Francisco/US.
Intelligent project tracking StructionSite raised $10 million in a funding round led by 500 Global. Hiring remote in the US.
People intelligence platform Diversio raised $6.5 million in Series A funding from a group of investors. Hiring in Canada, the US, and the UK.
Healthcare customer support platform BirchAI (a spinout from the Allen Institute for AI (AI2), our long-term partner) raised $3.1 million in seed financing led by Radical Ventures. You can read our interview with BirchAI CTO here. Hiring in Seattle/US.

Like

Comment

Share

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

👷‍♀️🧑🏻‍🎓👩‍💻👨🏻‍🏫 The MoE Momentum

👷‍♀️🧑🏻‍🎓👩‍💻👨🏻‍🏫 The MoE Momentum

Weekly news digest curated by the industry insiders

📝 Editorial

🔎 ML Research

🤖 Cool AI Tech Releases

🛠 Real World ML

🐦 Useful Tweet

💸 Money in AI

Older messages

📌 Learn from 40+ AI experts at mlcon 2.0 ML dev conf <Feb22-23>

🥸 Edge#158: Microsoft KEAR is a Deep Learning Model for Common Sense Reasoning

🎙Yinhan Liu/CTO of BirchAI about applying ML in the healthcare industry

➰➰ Edge#157: CI/CD in ML Solutions

🚘 Uber Continues its Open-Source ML Traction

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR