👷♀️🧑🏻🎓👩💻👨🏻🏫 The MoE Momentum
Was this email forwarded to you? Sign up here 📝 EditorialMassively large neural networks seem to be the pattern to follow these days in the deep learning space. The size and complexity of deep learning models are reaching unimaginable levels, particularly in models that try to master multiple tasks. Such large models are not only difficult to understand but incredibly challenging to train and run without incurring significant computational expenses. In recent years, Mixture of experts (MoE) has emerged as one of the most efficient techniques to build and train large multi-task models. While MoE is not necessarily a novel ML technique, it has certainly experienced a renaissance with the rapid emergence of massively large deep learning models. Conceptually, MoE is rooted in the simple idea of decomposing a large multi-task network into smaller expert networks that can master an individual task. This might sound similar to ensemble learning, but the big difference is that MoE models execute one expert network at any given time. The greatest benefit of MoE models is that their computation costs scale sub-linearly with respect to their size. As a result, MoE has become one of the most adopted architectures for large-scale models. Just this week, Microsoft and Google Research published papers outlining techniques to improve the scalability of MoE models. As big ML models continue to dominate the deep learning space, MoE techniques are likely to become more mainstream in real-world ML solutions. 🔺🔻 TheSequence Scope is our Sunday free digest. To receive high-quality educational content about the most relevant concepts, research papers, and developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻 🗓 Next week in TheSequence Edge: Edge#159: we recap our MLOPs series (two parts!); Edge#160: we deep dive into Aporia, an ML Observability platform. Now, let’s review the most important developments in the AI industry this week 🔎 ML ResearchData2vec Meta (Facebook) AI Research (FAIR) published a paper unveiling data2vec, a self-supervised learning model that mastered speech, language, and computer vision tasks →read more on FAIR blog MoE Task Routing Google Research published a paper introducing TaskMoE, a technique to extract smaller, more efficient subnetworks from large multi-task models based on Mixture of experts (MoE) architectures →read more on Google Research blog DeepSpeed and MoE Microsoft Research published a very detailed blog post detailing how to use its DeepSpeed framework to scale the training of Mixture of experts (MoE) models →read more on Microsoft Research blog StylEx – Visual Interpretability of Classifiers Google Research published a paper proposing StylEx, a method to visualize the influence that individual attributes have on the output of ML classifiers →read more on Google Research blog 🤖 Cool AI Tech ReleasesMacaw Demo The Allen Institute for AI (AI2) open-sourced a demo solution that compares its Macaw model against OpenAI’s GPT-3 →read more on AI2 blog 🛠 Real World MLAI Fairness at LinkedIn The LinkedIn engineering team published some details about how they integrate fairness as a first-class citizen of its AI products →read more on LinkedIn Engineering blog 🐦 Useful TweetIn 2019, @quantumblack, our #AI firm, launched #Kedro, its first open-source software tool. Today, we’re taking the next step in our #opensource journey and donating Kedro to the Linux Foundation. Learn more ➡️ mck.co/3KkP1wB
#McKinseyonAI #MachineLearning 💸 Money in AIAI-powered
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📌 Learn from 40+ AI experts at mlcon 2.0 ML dev conf <Feb22-23>
Friday, January 21, 2022
Our partner cnvrg.io is hosting another incredible virtual conference mlcon 2.0! It is FREE
🥸 Edge#158: Microsoft KEAR is a Deep Learning Model for Common Sense Reasoning
Thursday, January 20, 2022
What's New in AI, a deep dive into one of the freshest research papers or technology frameworks that are worth your attention. Our goal is to keep you up to date with new developments in AI in a
🎙Yinhan Liu/CTO of BirchAI about applying ML in the healthcare industry
Wednesday, January 19, 2022
On what healthcare companies spend tens of billions of dollars?
➰➰ Edge#157: CI/CD in ML Solutions
Tuesday, January 18, 2022
In this issue: we explore CI/CD in ML Solutions; we discuss Amazon's continual learning architecture that manages the ML models lifecycle; we overview CML, an open-source library for enabling CI/CD
🚘 Uber Continues its Open-Source ML Traction
Sunday, January 16, 2022
Weekly news digest curated by the industry insiders
You Might Also Like
Software Testing Weekly - Issue 218
Friday, May 3, 2024
Unit, Integration and End-to-End Tests 🔧 View on the Web Archives ISSUE 218 May 4th 2024 COMMENT Welcome to the 218th issue! I loved going through this discussion among software engineers: What is your
gpt2-chatbot and OpenAI search engine - Weekly News Roundup - Issue #465
Friday, May 3, 2024
Plus: Med-Gemini; Vidu - Chinese answer to OpenAI's Sora; the first race of Abu Dhabi Autonomous Racing League; deepfaking celebrities to teach math and physics; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
NASA comes to the rescue of crowded rocket launch sites
Friday, May 3, 2024
Plus: Fisker's legal woes and Sprinklr lays off 100 View this email online in your browser By Christine Hall Friday, May 3, 2024 Good afternoon, and welcome to TechCrunch PM. We made it to Friday,
🎮 Forget the PS5 Pro, I Still Love My PS4 — The Best Lock Screen Widgets for iPhone
Friday, May 3, 2024
Also: Smart Home Mistakes to Avoid, and More! How-To Geek Logo May 3, 2024 Did You Know Half of the world's geysers are located in Yellowstone National Park. 🔑 More Passkeys Happy Friday! You can
JSK Daily for May 3, 2024
Friday, May 3, 2024
JSK Daily for May 3, 2024 View this email in your browser A community curated daily e-mail of JavaScript news The Power of React's Virtual DOM: A Comprehensive Explanation Modern JavaScript
Musk raises $6B for AI startup
Friday, May 3, 2024
Also, is TikTok dodging Apple's commissions? View this email online in your browser By Haje Jan Kamps Friday, May 3, 2024 Welcome to Startups Weekly — Haje's weekly recap of everything you can
SWLW #597: Seek first to understand, The "Iterative Adjacent Possible", and more.
Friday, May 3, 2024
Weekly articles & videos about people, culture and leadership: everything you need to design the org that makes the product. A weekly newsletter by Oren Ellenbogen with the best content I found
iOS Dev Weekly - Issue 659
Friday, May 3, 2024
Is Swift 6 hitting one of the REAL hard problems? Not generics, not data race safety, but naming things! 😬 View on the Web Archives ISSUE 659 May 3rd 2024 Comment Naming things is one of the two hard
Daily Coding Problem: Problem #1430 [Easy]
Friday, May 3, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Facebook. You have a large array with most of the elements as zero. Use a more space-
Making sense of product management
Friday, May 3, 2024
Getting a sense of product sense Whenever I hear the term product sense, I think back to a Seinfeld episode about write-offs (with a little artistic license). Jerry: “You don't even know what