👁👂🏻Multi-Modal Learning is Becoming Real
Was this email forwarded to you? Sign up here 📝 EditorialOne of the cognitive marvels of the human brain is its ability to simultaneously process information from different sensorial inputs such as speech, touch, or vision in order to accomplish a specific task. Since we are babies, we learn to develop representations of the world based on many different modalities such as objects, sounds, verbal descriptions, and others. Recreating the ability to learn from different modalities simultaneously has long been a goal of ML, but most of those efforts remained constrained to research exercises. For decades, most supervised ML models have been highly optimized for a single representation of the information. That’s rapidly changing now. Multimodal ML is becoming a reality. In the last two years, we have seen the emergence of multimodal ML models applied to real-world scenarios. Natural language and computer vision have been a powerful combination with the release of models such as OpenAI’s Dall-E or NVIDIA’s GauGAN. This week, Meta AI Research released a new model that combines audio and visual inputs to improve speech recognition. The model uses self-supervision techniques to analyze lip movements from unlabeled videos. That idea would have sounded insane a handful of years ago. While there are still plenty of milestones to reach in individual deep learning modalities, multimodal learning is an essential step towards the goal of building general AI. Little by little, such steps are making it more and more real. 🔺🔻 TheSequence Scope is our Sunday free digest. To receive high-quality educational content about the most relevant concepts, research papers, and developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻 🗓 Next week in TheSequence Edge: Edge#155: we discuss A/B Testing for ML Models; we explore how Meta AI uses ML A/B testing for improving its news feed ranking; we overview W&B, one of the top ML experimentations platforms on the market. Edge#156: we deep dive into the ML mechanisms that power recruiting recommendations at LinkedIn Now, let’s review the most important developments in the AI industry this week 🔎 ML ResearchAudio-Visual Models for Speech Recognition Meta AI Research (FAIR) published a paper proposing a technique that uses both audio and vision to better understand speech →read more on the FAIR team blog Hidden Agenda Researchers from DeepMind and Harvard University proposed Hidden Agenda, a two-team social deduction game used to help reinforcement learning agents develop cooperative mechanics →read more in the original research paper Transformers and Semi-Supervised Learning for Video Amazon Research published two papers about novel video intelligence techniques powered by transformers and self-supervised learning →read more on Amazon Research blog Training Rescoring for Speech Recognition Staying with Amazon Research: the tech giant published a paper proposing an NLU-based method to rescoring the training in speech recognition models used in the Alexa digital assistant →read more on Amazon Research blog 🤖 Cool AI Tech ReleasesNVIDIA Canvas Canvas, NVIDIA’s art generative toolset, got a few updates this week →read more on NVIDIA blog NVIDIA Omniverse NVIDIA Omniverse is a newly announced studio for creating virtual worlds →read more on NVIDIA blog 🛠 Real World MLNotorious computer scientist Chip Huyen published a post detailing common challenges and solutions for real-time ML solutions →read more in Huyen’s original post 🐦 Follow us on Twitter where we share all our recommendations in bite-sized form 💸 Money in AIAI&ML
AI-powered
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
⚡️ Last chance - $20/YEAR
Saturday, January 8, 2022
Hi there, This is the last chance to subscribe to our Premium newsletter, TheSequence Edge, for only $20 per year. Only 10 hours left! Don't miss out – subscribe and share with friends. Currently,
↗️ Data Scientists, You’re Invited: Make 2022 a Year of Continuous Improvement
Friday, January 7, 2022
We're excited to partner with Domino Data Lab and invite you to this awesome event
👯♀️🎲 Edge#154: DeepMind’s New Super Model that can Master Perfect and Imperfect Information Games
Thursday, January 6, 2022
🔥 The LAST two days to subscribe to TheSequence Edge with a unique 60% discount🔥
👥 Edge#153: ML Model Versioning
Tuesday, January 4, 2022
+how Uber backtests and versions forecasting models at scale; +Lyft's Amundsen, an open-sourced data discovery and versioning platform for data science workflows
😱 Flash 60% OFF
Monday, January 3, 2022
That's an absolutely unique offer to celebrate the start of 2022
You Might Also Like
⚙️ AI has emotions now
Monday, April 29, 2024
Plus: Meta AI? More like Mid-ta AI!
Yikes! Copilot failed all our coding tests
Monday, April 29, 2024
iPad Pro with M4; Best security keys; AI conducts job interviews now -- ZDNET ZDNET Tech Today - US April 29, 2024 placeholder Yikes! Microsoft Copilot failed every single one of my coding tests I ran
Re: The smart home product I use every day!
Monday, April 29, 2024
Hey , Earlier this month, I emailed you about one of my favorite smart home products, a robot vacuum and mop. I wanted to let you know that Samsung currently has a Spring Black Friday Sale and is
The EU draws its regulatory cords tighter around Apple
Monday, April 29, 2024
The EU has said Apple's iPadOS will now fall under the DMA View this email online in your browser By Alex Wilhelm Monday, April 29, 2024 Welcome to TechCrunch AM! We're off to a quick start
GCP Newsletter #396
Monday, April 29, 2024
Welcome to issue #396 April 29th, 2024 News Networking Official Blog Partners Introducing the Verified Peering Provider program, a simple alternative to Direct Peering - Google has launched a new
How many Vision Pro headsets has Apple sold?
Monday, April 29, 2024
The Morning After It's Monday, April 29, 2024. Apple Vision Pro headset production is reportedly being cut, sales are reportedly “way down.” But but but wait: Wasn't the Vision Pro meant to
Okta Warns of Unprecedented Surge in Proxy-Driven Credential Stuffing Attacks
Monday, April 29, 2024
THN Daily Updates Newsletter cover Webinar -- Uncovering Contemporary DDoS Attack Tactics -- and How to Fight Back Stop DDoS Attacks Before They Stop Your Business... and Make You Headline News.
Import AI 370: 213 AI safety challenges; everything becomes a game; Tesla's big cluster
Monday, April 29, 2024
Are AI systems more like religious artifacts or disposable entertainment? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple renews OpenAI talks 🧠, Google fires Python team 👨💻, React 19 beta ⚛️
Monday, April 29, 2024
Apple has renewed discussions with OpenAI to use its generative AI technology to power new features coming to the iPhone Sign Up |Advertise|View Online TLDR Together With QA Wolf TLDR 2024-04-29 😘 Kiss
Architecture Weekly #177 - 29nd April 2024
Monday, April 29, 2024
How do you make predictions about tech without the magical crystal ball? We did that today by example. We analysed what Redis and Terraform license changes relate to the new Typescript framework Effect