TheSequence - 📹 🤖 Transformers for Video
Was this email forwarded to you? Sign up here 📝 EditorialTransformers are universally acknowledged as the most important development in deep learning architectures of the last decade. The impact of transformers in natural language understanding (NLU) tasks has challenged the imagination of even the most hard-core believers in neural networks. In recent years, we have seen steady contributions of transformers to domains such as computer vision but mostly related to image-related tasks such as classification. Now transformer architectures are expanding into a new frontier: video intelligence. The idea of using transformers for video intelligence tasks makes a lot of sense. Typically, video intelligence techniques require large amounts of labeled data to understand the predicted actions in a video frame. Transformers excel at learning from unlabeled datasets, and there are a lot of videos available on the internet to learn from. Just like in NLU tasks, transformer models could be pretrained in large sets of unlabeled videos and fine-tuned for specific tasks. Last week, OpenAI unveiled its work on video pertaining (VPT) models. This type of model adapts the principle of transformers to video intelligence tasks. To push the boundaries, OpenAI pretrained VPT in Minecraft videos, and the model was able to master tasks that required large training pipelines with techniques such as reinforcement learning which have produced some of the best results in video intelligence tasks in recent years. With GPT-3, OpenAI established kind of the gold standard for transformers in NLU tasks. They follow up with their work on Dall-E and Dall-E2 to apply transformers to both images and language tasks. VPT seems to be their first major step in extending this work into the area of video intelligence. Maybe VPT is the foundation for OpenAI’s new supermodel. 🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻 🗓 Next week in TheSequence Edge: Edge#203: we explain what Graph Recurrent Neural Networks are, discuss GNNs on Dynamic Graphs, explore DeepMind’s Jraph, a GNN Library for JAX. Edge#204: we deep dive into Imagen, Google’s impressive text-to-image alternative to OpenAI’s DALLE-2. Now, let’s review the most important developments in the AI industry this week 🔎 ML ResearchMastering Minecraft with Video Pretraining OpenAI published a paper detailing video pretraining(VPT), a semi-supervised, imitation learning method that was able to learn to play Minecraft from unlabeled datasets →read more on OpenAI blog QML Improvements AI labs from Google, Microsoft, CalTech, Harvard and others collaborated on quantum ML (QML) techniques that show tangible improvements over classical counterparts →read more on Google Research blog Swin Transformer Improvements Microsoft Research published details about improvements to Swin Transformer, its 3 billion parameter computer vision model →read more on Microsoft Research blog GODEL Microsoft Research published a paper detailing GODEL, a new form of pretrained language model that also leverages external datasets allowing to focus on specific tasks or engage in open-ended conversation →read more on Microsoft Research blog 📌 Event: June 29th – Arize:Observe UnstructuredOnly three days left to register for Arize:Observe Unstructured. This free, virtual event on Wednesday features an all-star lineup of speakers including from OpenAI, Hugging Face, the creator of UMAP & more! Register now. 🤖 Cool AI Tech ReleasesGitHub Copilot GA GitHub AI-based pair programming agent reached general availability →read more on GitHub blog TorchGeo PyTorch open-sourced TorchGeo, a library for processing geospatial data in ML models →read more on PyTorch blog 🛠 Real World MLPyTorch at Disney The Disney Media & Entertainment Distribution (DMED) detailed the PyTorch architecture used for activity recognition across video, audio, and text datasets →read more on PyTorch blog 💸 Money in AI
Acqusitions
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Key phrases
Older messages
🎙 Orly Amsalem/cnvrg.io on building developer-first ML products
Friday, June 24, 2022
Can software developer be transformed into an ML creator?
🟢⚪️ Edge#202: How to Ship ML-powered Apps with Baseten
Thursday, June 23, 2022
Building a performant model is just the start, what to do next?
🎙 Google’s Allen Day on Using ML in the Cryptocurrency Space
Wednesday, June 22, 2022
It's so inspiring to learn from practitioners and thinkers. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight
Sign in to TheSequence
Tuesday, June 21, 2022
. Here's a link to sign in to TheSequence. This link can only be used once and expires after 24 hours. Sign in now © 2022 Jesus Rodriguez, Ksenia Semenova 75 Miracle Mile, Suite 7688, Coral Gables,
💠 Edge#201: Understanding Graph Convolutional Neural Networks
Tuesday, June 21, 2022
In this issue: we explain Graph Convolutional Neural Networks; we overview the original GCN Paper; we explore PyTorch Geometric, one of the most complete GNN frameworks available today. Enjoy the
You Might Also Like
Apple renews OpenAI talks 🧠, Google fires Python team 👨💻, React 19 beta ⚛️
Monday, April 29, 2024
Apple has renewed discussions with OpenAI to use its generative AI technology to power new features coming to the iPhone Sign Up |Advertise|View Online TLDR Together With QA Wolf TLDR 2024-04-29 😘 Kiss
Architecture Weekly #177 - 29nd April 2024
Monday, April 29, 2024
How do you make predictions about tech without the magical crystal ball? We did that today by example. We analysed what Redis and Terraform license changes relate to the new Typescript framework Effect
Software Testing Weekly - Issue 217
Monday, April 29, 2024
How do you deal with conflicts in QA? ⚔️ View on the Web Archives ISSUE 217 April 29th 2024 COMMENT Welcome to the 217th issue! How do you deal with conflicts in QA? Ideally, you'd like to know how
📧 Did you watch the free MMA chapters? (1+ hours of content)
Monday, April 29, 2024
Did you watch the free MMA chapters? Hey there! 👋 I wish you a fantastic start to the week. Last week, I launched Modular Monolith Architecture. More than 300+ students are already deep into the MMA
WP Weekly 191 - Essentials - Duplicate in Core, White Label Kadence, Studio for Mac
Monday, April 29, 2024
Read on Website WP Weekly 191 / Essentials It seems many essential features are being covered in-house, be it the upcoming duplicate posts/pages feature in the WordPress core or the launch of Studio
SRE Weekly Issue #422
Monday, April 29, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries,
Quick question
Sunday, April 28, 2024
I want to learn how I can better serve you
Kotlin Weekly #404 (NOT FOUND)
Sunday, April 28, 2024
ISSUE #404 28st of April 2024 Announcements Kotlin Multiplatform State of the Art Survey 2024 Help to shape and understand the Kotlin Multiplatform Ecosystem! It takes 4 minutes to fill this survey.
📲 Why Is It Called Bluetooth? — Check Out This AI Text to Song Generator
Sunday, April 28, 2024
Also: What to Know About Emulating Games on iPhone, and More! How-To Geek Logo April 28, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your
Daily Coding Problem: Problem #1425 [Easy]
Sunday, April 28, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. Suppose an arithmetic expression is given as a binary tree. Each leaf is an