Was this email forwarded to you? Sign up here

👁👂🏻Multi-Modal Learning is Becoming Real

Weekly news digest curated by the industry insiders

Jan 9

📝 Editorial

One of the cognitive marvels of the human brain is its ability to simultaneously process information from different sensorial inputs such as speech, touch, or vision in order to accomplish a specific task. Since we are babies, we learn to develop representations of the world based on many different modalities such as objects, sounds, verbal descriptions, and others. Recreating the ability to learn from different modalities simultaneously has long been a goal of ML, but most of those efforts remained constrained to research exercises. For decades, most supervised ML models have been highly optimized for a single representation of the information. That’s rapidly changing now. Multimodal ML is becoming a reality.

In the last two years, we have seen the emergence of multimodal ML models applied to real-world scenarios. Natural language and computer vision have been a powerful combination with the release of models such as OpenAI’s Dall-E or NVIDIA’s GauGAN. This week, Meta AI Research released a new model that combines audio and visual inputs to improve speech recognition. The model uses self-supervision techniques to analyze lip movements from unlabeled videos. That idea would have sounded insane a handful of years ago. While there are still plenty of milestones to reach in individual deep learning modalities, multimodal learning is an essential step towards the goal of building general AI. Little by little, such steps are making it more and more real.

Share

🔺🔻 TheSequence Scope is our Sunday free digest. To receive high-quality educational content about the most relevant concepts, research papers, and developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻

🗓 Next week in TheSequence Edge:

Edge#155: we discuss A/B Testing for ML Models; we explore how Meta AI uses ML A/B testing for improving its news feed ranking; we overview W&B, one of the top ML experimentations platforms on the market.

Edge#156: we deep dive into the ML mechanisms that power recruiting recommendations at LinkedIn

Now, let’s review the most important developments in the AI industry this week

🔎 ML Research

Audio-Visual Models for Speech Recognition

Meta AI Research (FAIR) published a paper proposing a technique that uses both audio and vision to better understand speech →read more on the FAIR team blog

Hidden Agenda

Researchers from DeepMind and Harvard University proposed Hidden Agenda, a two-team social deduction game used to help reinforcement learning agents develop cooperative mechanics →read more in the original research paper

Transformers and Semi-Supervised Learning for Video

Amazon Research published two papers about novel video intelligence techniques powered by transformers and self-supervised learning →read more on Amazon Research blog

Training Rescoring for Speech Recognition

Staying with Amazon Research: the tech giant published a paper proposing an NLU-based method to rescoring the training in speech recognition models used in the Alexa digital assistant →read more on Amazon Research blog

🤖 Cool AI Tech Releases

NVIDIA Canvas

Canvas, NVIDIA’s art generative toolset, got a few updates this week →read more on NVIDIA blog

NVIDIA Omniverse

NVIDIA Omniverse is a newly announced studio for creating virtual worlds →read more on NVIDIA blog

🛠 Real World ML

Notorious computer scientist Chip Huyen published a post detailing common challenges and solutions for real-time ML solutions →read more in Huyen’s original post

🐦 Follow us on Twitter where we share all our recommendations in bite-sized form

💸 Money in AI

AI&ML

Data labeling startup AIMMO raised $12 million in a Series A round.

AI-powered

AI and advanced analytics solutions provider Fractal raised $360 million in a funding round led by TPG Capital Asia. Hiring in the US and India.
AI and computer vision startup Avataar raised $45 million in Series B funding led by Tiger Global. Hiring in India.
Harmful narratives detection platform Pendulum raised a $5.9 million seed round led by Madrona Venture Group. Hiring in Seatle/US or remote.
Waste management platform RoadRunner Recycling raised $70 million in a Series D round led by BeyondNetZero. Hiring in Pittsburgh/US or remote.
Digital platform conductor ReadyWorks raised $8 million in a Series A financing round led by Credit Suisse Asset Management’s Next Investors.
Online biometric face authentication startup iProov raised $70 million in funding led by Sumeru Equity Partners. Hiring in the US and UK.
Bot mitigation and fraud detection company Human Security raised $100 million in a growth round of funding led by WestCap. Hiring in the US.

Like

Comment

Share

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

⚡️ Last chance - $20/YEAR

Saturday, January 8, 2022

Hi there, This is the last chance to subscribe to our Premium newsletter, TheSequence Edge, for only $20 per year. Only 10 hours left! Don't miss out – subscribe and share with friends. Currently,

↗️ Data Scientists, You’re Invited: Make 2022 a Year of Continuous Improvement

Friday, January 7, 2022

We're excited to partner with Domino Data Lab and invite you to this awesome event ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

👯‍♀️🎲 Edge#154: DeepMind’s New Super Model that can Master Perfect and Imperfect Information Games

Thursday, January 6, 2022

🔥 The LAST two days to subscribe to TheSequence Edge with a unique 60% discount🔥 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

👥 Edge#153: ML Model Versioning

Tuesday, January 4, 2022

+how Uber backtests and versions forecasting models at scale; +Lyft's Amundsen, an open-sourced data discovery and versioning platform for data science workflows ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

😱 Flash 60% OFF

Monday, January 3, 2022

That's an absolutely unique offer to celebrate the start of 2022 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Software Testing Weekly - Issue 247

Tuesday, November 26, 2024

QA Job Hunting Resources 📚 View on the Web Archives ISSUE 247 November 26th 2024 COMMENT Welcome to the 247th issue! Today, I'd like to highlight a fantastic set of QA Job Hunting Resources.

🔒 The Vault Newsletter: November issue 🔑

Monday, November 25, 2024

Get the latest business security news, updates, and advice from 1Password. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

🧐 The Most Interesting Phones You Didn't See in 2024 — Making Reddit Faster on Older Devices

Monday, November 25, 2024

Also: Best Black Friday Deals So Far, and More! How-To Geek Logo November 25, 2024 Did You Know If you look closely over John Lennon's shoulder on the iconic cover of The Beatles Abbey Road album,

JSK Daily for Nov 25, 2024

Monday, November 25, 2024

JSK Daily for Nov 25, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted

Ranked | How Americans Rate Business Figures 📊

Monday, November 25, 2024

This graphic visualizes the results of a YouGov survey that asks Americans for their opinions on various business figures. View Online | Subscribe Presented by: Non-consensus strategies that go where

Spyglass Dispatch: Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad

Monday, November 25, 2024

Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad The

👁👂🏻Multi-Modal Learning is Becoming Real

👁👂🏻Multi-Modal Learning is Becoming Real

Weekly news digest curated by the industry insiders

📝 Editorial

🔎 ML Research

🤖 Cool AI Tech Releases

🛠 Real World ML

💸 Money in AI

Older messages

⚡️ Last chance - $20/YEAR

↗️ Data Scientists, You’re Invited: Make 2022 a Year of Continuous Improvement

👯‍♀️🎲 Edge#154: DeepMind’s New Super Model that can Master Perfect and Imperfect Information Games

👥 Edge#153: ML Model Versioning

😱 Flash 60% OFF

You Might Also Like

Software Testing Weekly - Issue 247

🔒 The Vault Newsletter: November issue 🔑

🧐 The Most Interesting Phones You Didn't See in 2024 — Making Reddit Faster on Older Devices

JSK Daily for Nov 25, 2024

Ranked | How Americans Rate Business Figures 📊

Spyglass Dispatch: Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad

Daily Coding Problem: Problem #1619 [Hard]

Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow

What Investors Want From AI Startups in 2025

GCP Newsletter #426