👁👂🏻Multi-Modal Learning is Becoming Real
Was this email forwarded to you? Sign up here 📝 EditorialOne of the cognitive marvels of the human brain is its ability to simultaneously process information from different sensorial inputs such as speech, touch, or vision in order to accomplish a specific task. Since we are babies, we learn to develop representations of the world based on many different modalities such as objects, sounds, verbal descriptions, and others. Recreating the ability to learn from different modalities simultaneously has long been a goal of ML, but most of those efforts remained constrained to research exercises. For decades, most supervised ML models have been highly optimized for a single representation of the information. That’s rapidly changing now. Multimodal ML is becoming a reality. In the last two years, we have seen the emergence of multimodal ML models applied to real-world scenarios. Natural language and computer vision have been a powerful combination with the release of models such as OpenAI’s Dall-E or NVIDIA’s GauGAN. This week, Meta AI Research released a new model that combines audio and visual inputs to improve speech recognition. The model uses self-supervision techniques to analyze lip movements from unlabeled videos. That idea would have sounded insane a handful of years ago. While there are still plenty of milestones to reach in individual deep learning modalities, multimodal learning is an essential step towards the goal of building general AI. Little by little, such steps are making it more and more real. 🔺🔻 TheSequence Scope is our Sunday free digest. To receive high-quality educational content about the most relevant concepts, research papers, and developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻 🗓 Next week in TheSequence Edge: Edge#155: we discuss A/B Testing for ML Models; we explore how Meta AI uses ML A/B testing for improving its news feed ranking; we overview W&B, one of the top ML experimentations platforms on the market. Edge#156: we deep dive into the ML mechanisms that power recruiting recommendations at LinkedIn Now, let’s review the most important developments in the AI industry this week 🔎 ML ResearchAudio-Visual Models for Speech Recognition Meta AI Research (FAIR) published a paper proposing a technique that uses both audio and vision to better understand speech →read more on the FAIR team blog Hidden Agenda Researchers from DeepMind and Harvard University proposed Hidden Agenda, a two-team social deduction game used to help reinforcement learning agents develop cooperative mechanics →read more in the original research paper Transformers and Semi-Supervised Learning for Video Amazon Research published two papers about novel video intelligence techniques powered by transformers and self-supervised learning →read more on Amazon Research blog Training Rescoring for Speech Recognition Staying with Amazon Research: the tech giant published a paper proposing an NLU-based method to rescoring the training in speech recognition models used in the Alexa digital assistant →read more on Amazon Research blog 🤖 Cool AI Tech ReleasesNVIDIA Canvas Canvas, NVIDIA’s art generative toolset, got a few updates this week →read more on NVIDIA blog NVIDIA Omniverse NVIDIA Omniverse is a newly announced studio for creating virtual worlds →read more on NVIDIA blog 🛠 Real World MLNotorious computer scientist Chip Huyen published a post detailing common challenges and solutions for real-time ML solutions →read more in Huyen’s original post 🐦 Follow us on Twitter where we share all our recommendations in bite-sized form 💸 Money in AIAI&ML
AI-powered
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
⚡️ Last chance - $20/YEAR
Saturday, January 8, 2022
Hi there, This is the last chance to subscribe to our Premium newsletter, TheSequence Edge, for only $20 per year. Only 10 hours left! Don't miss out – subscribe and share with friends. Currently,
↗️ Data Scientists, You’re Invited: Make 2022 a Year of Continuous Improvement
Friday, January 7, 2022
We're excited to partner with Domino Data Lab and invite you to this awesome event
👯♀️🎲 Edge#154: DeepMind’s New Super Model that can Master Perfect and Imperfect Information Games
Thursday, January 6, 2022
🔥 The LAST two days to subscribe to TheSequence Edge with a unique 60% discount🔥
👥 Edge#153: ML Model Versioning
Tuesday, January 4, 2022
+how Uber backtests and versions forecasting models at scale; +Lyft's Amundsen, an open-sourced data discovery and versioning platform for data science workflows
😱 Flash 60% OFF
Monday, January 3, 2022
That's an absolutely unique offer to celebrate the start of 2022
You Might Also Like
Software Testing Weekly - Issue 247
Tuesday, November 26, 2024
QA Job Hunting Resources 📚 View on the Web Archives ISSUE 247 November 26th 2024 COMMENT Welcome to the 247th issue! Today, I'd like to highlight a fantastic set of QA Job Hunting Resources.
🔒 The Vault Newsletter: November issue 🔑
Monday, November 25, 2024
Get the latest business security news, updates, and advice from 1Password. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
🧐 The Most Interesting Phones You Didn't See in 2024 — Making Reddit Faster on Older Devices
Monday, November 25, 2024
Also: Best Black Friday Deals So Far, and More! How-To Geek Logo November 25, 2024 Did You Know If you look closely over John Lennon's shoulder on the iconic cover of The Beatles Abbey Road album,
JSK Daily for Nov 25, 2024
Monday, November 25, 2024
JSK Daily for Nov 25, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
Ranked | How Americans Rate Business Figures 📊
Monday, November 25, 2024
This graphic visualizes the results of a YouGov survey that asks Americans for their opinions on various business figures. View Online | Subscribe Presented by: Non-consensus strategies that go where
Spyglass Dispatch: Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad
Monday, November 25, 2024
Apple Throws Their Film to the Wolves • The AI Supercomputer Arms Race • Sony's Mobile Game • The EU Hunts Bluesky • Bluesky Hunts User Trust • 'Glicked' Pricked • One Massive iPad The
Daily Coding Problem: Problem #1619 [Hard]
Monday, November 25, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given two non-empty binary trees s and t , check whether tree t has exactly the
Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow
Monday, November 25, 2024
Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the
What Investors Want From AI Startups in 2025
Monday, November 25, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon
GCP Newsletter #426
Monday, November 25, 2024
Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's