TheSequence - 📹 🤖 Transformers for Video
Was this email forwarded to you? Sign up here 📝 EditorialTransformers are universally acknowledged as the most important development in deep learning architectures of the last decade. The impact of transformers in natural language understanding (NLU) tasks has challenged the imagination of even the most hard-core believers in neural networks. In recent years, we have seen steady contributions of transformers to domains such as computer vision but mostly related to image-related tasks such as classification. Now transformer architectures are expanding into a new frontier: video intelligence. The idea of using transformers for video intelligence tasks makes a lot of sense. Typically, video intelligence techniques require large amounts of labeled data to understand the predicted actions in a video frame. Transformers excel at learning from unlabeled datasets, and there are a lot of videos available on the internet to learn from. Just like in NLU tasks, transformer models could be pretrained in large sets of unlabeled videos and fine-tuned for specific tasks. Last week, OpenAI unveiled its work on video pertaining (VPT) models. This type of model adapts the principle of transformers to video intelligence tasks. To push the boundaries, OpenAI pretrained VPT in Minecraft videos, and the model was able to master tasks that required large training pipelines with techniques such as reinforcement learning which have produced some of the best results in video intelligence tasks in recent years. With GPT-3, OpenAI established kind of the gold standard for transformers in NLU tasks. They follow up with their work on Dall-E and Dall-E2 to apply transformers to both images and language tasks. VPT seems to be their first major step in extending this work into the area of video intelligence. Maybe VPT is the foundation for OpenAI’s new supermodel. 🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻 🗓 Next week in TheSequence Edge: Edge#203: we explain what Graph Recurrent Neural Networks are, discuss GNNs on Dynamic Graphs, explore DeepMind’s Jraph, a GNN Library for JAX. Edge#204: we deep dive into Imagen, Google’s impressive text-to-image alternative to OpenAI’s DALLE-2. Now, let’s review the most important developments in the AI industry this week 🔎 ML ResearchMastering Minecraft with Video Pretraining OpenAI published a paper detailing video pretraining(VPT), a semi-supervised, imitation learning method that was able to learn to play Minecraft from unlabeled datasets →read more on OpenAI blog QML Improvements AI labs from Google, Microsoft, CalTech, Harvard and others collaborated on quantum ML (QML) techniques that show tangible improvements over classical counterparts →read more on Google Research blog Swin Transformer Improvements Microsoft Research published details about improvements to Swin Transformer, its 3 billion parameter computer vision model →read more on Microsoft Research blog GODEL Microsoft Research published a paper detailing GODEL, a new form of pretrained language model that also leverages external datasets allowing to focus on specific tasks or engage in open-ended conversation →read more on Microsoft Research blog 📌 Event: June 29th – Arize:Observe UnstructuredOnly three days left to register for Arize:Observe Unstructured. This free, virtual event on Wednesday features an all-star lineup of speakers including from OpenAI, Hugging Face, the creator of UMAP & more! Register now. 🤖 Cool AI Tech ReleasesGitHub Copilot GA GitHub AI-based pair programming agent reached general availability →read more on GitHub blog TorchGeo PyTorch open-sourced TorchGeo, a library for processing geospatial data in ML models →read more on PyTorch blog 🛠 Real World MLPyTorch at Disney The Disney Media & Entertainment Distribution (DMED) detailed the PyTorch architecture used for activity recognition across video, audio, and text datasets →read more on PyTorch blog 💸 Money in AI
Acqusitions
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
🎙 Orly Amsalem/cnvrg.io on building developer-first ML products
Friday, June 24, 2022
Can software developer be transformed into an ML creator?
🟢⚪️ Edge#202: How to Ship ML-powered Apps with Baseten
Thursday, June 23, 2022
Building a performant model is just the start, what to do next?
🎙 Google’s Allen Day on Using ML in the Cryptocurrency Space
Wednesday, June 22, 2022
It's so inspiring to learn from practitioners and thinkers. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight
Sign in to TheSequence
Tuesday, June 21, 2022
. Here's a link to sign in to TheSequence. This link can only be used once and expires after 24 hours. Sign in now © 2022 Jesus Rodriguez, Ksenia Semenova 75 Miracle Mile, Suite 7688, Coral Gables,
💠 Edge#201: Understanding Graph Convolutional Neural Networks
Tuesday, June 21, 2022
In this issue: we explain Graph Convolutional Neural Networks; we overview the original GCN Paper; we explore PyTorch Geometric, one of the most complete GNN frameworks available today. Enjoy the
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your