The Sequence Chat: Salesforce Research's Junnan Li on Multimodal Generative AI
Was this email forwarded to you? Sign up here The Sequence Chat: Salesforce Research's Junnan Li on Multimodal Generative AIOne of the creators of the famous BLIP-2 model shares his insights about the current state of multimodal generative AI.👤 Quick bio
I ‘m a research scientist at Salesforce Research focusing on multimodal AI research. I did my PhD at National University of Singapore in Computer Vision. I got started in computer vision and machine learning in my undergrad FYP project 🛠 ML Work
BLIP-2 is a scalable multimodal pre-training method that enables any Large Language Models (LLMs) to ingest and understand images. It unlocks the capabilities of zero-shot image-to-text generation and powers the world’s first open-sourced multimodal Chatbot prototype. Checkout this blog post for more details: https://blog.salesforceairesearch.com/blip-2/ Before BLIP-2, we have published BLIP, one of the most popular vision-and–language models and the #18 high-cited AI papers in 2022. BLIP-2 achieves significant enhancement over BLIP by effectively leveraging frozen pre-trained image encoders and LLMs.
BLIP-2 achieves zero-shot image-to-text generation by enabling LLMs to understand images, thus harvesting the zero-shot text generation capability from LLMs. It is challenging for LLMs to understand images, due to the domain gap between images and texts. We propose a novel two-stage pre-training strategy to bridge this gap.
GPT-4 is amazing and demonstrates strong image-to-text generation capabilities. There are two key differences between BLIP-2 and GPT-4
The world is multimodal by nature, thus an AI agent that can understand and simulate the world need to be multimodal. In my opinion, multimodal generative AI will drive the next wave of AI breakthroughs. There are so many exciting areas, such as video generation, embodied multimodal AI, etc. 💥 Miscellaneous – a set of rapid-fire questions
Self-supervised/unsupervised learning
I believe that open-source is the preferable approach to drive safer and responsible AI research that can benefit a larger community. However, it requires careful planning before open-sourcing a model to mitigate its potential risks.
Yes!
This question is out of my scope and cannot answer :). You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Inside LangChain: The Super Popular LLM Framework You Need to Know About
Wednesday, April 19, 2023
LangChain is part of a generation of new frameworks that are integrating LLMs into mainstream software development lifecycles.
📌 Webinar: Improving search relevance with ML monitoring
Wednesday, April 19, 2023
Let's take a dive into ML systems for ranking and search relevance and what it means to monitor them for quality, edge cases, and corrupt data
Big vs. Small, Open Source vs. API Based, the Philosophical Frictions of Foundation Models
Wednesday, April 19, 2023
Sundays, The Sequence Scope brings a summary of the most important research papers, technology releases and VC funding deals in the artificial intelligence space.
📝 Guest Post: How to Enhance the Usefulness of Large Language Models*
Wednesday, April 19, 2023
In this guest post, Filip Haltmayer, a Software Engineer at Zilliz, explains how LangChain and Milvus can enhance the usefulness of Large Language Models (LLMs) by allowing for the storage and
Edge 283: Federated Learning and Differential Privacy
Wednesday, April 19, 2023
Applying deferential privacy to federated learning(FL) scenarios, Meta AI's research and the best open source frameworks in this area.
You Might Also Like
Daily Coding Problem: Problem #1618 [Easy]
Sunday, November 24, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Zillow. Let's define a "sevenish" number to be one which is either a power
PD#602 How Netflix Built Self-Healing System to Survive Concurrency Bug
Sunday, November 24, 2024
CPUs were dying, the bug was temporarily un-fixable, and they had no viable path forward
RD#602 What are React Portals?
Sunday, November 24, 2024
A powerful feature that allows rendering components outside their parent component's DOM hierarchy
C#533 What's new in C# 13
Sunday, November 24, 2024
Params collections support, a new Lock type and others
⚙️ Smaller but deeper: Writer’s secret weapon to better AI
Sunday, November 24, 2024
November 24, 2024 | Read Online Ian Krietzberg Good morning. I sat down recently with Waseem Alshikh, the co-founder and CTO of enterprise AI firm Writer. Writer recently made waves with the release of
Sunday Digest | Featuring 'How Often People Go to the Doctor, by Country' 📊
Sunday, November 24, 2024
Every visualization published this week, in one place. Nov 24, 2024 | View Online | Subscribe | VC+ | Download Our App Hello, welcome to your Sunday Digest. This week we visualized the GDP per capita
Android Weekly #650 🤖
Sunday, November 24, 2024
View in web browser 650 November 24th, 2024 Articles & Tutorials Sponsored Why your mobile releases are a black box “What's the status of the release?” Who knows. Uncover the unseen challenges
PHP 8.4 is released, Dynamic Mailer Configuration, and more! - №540
Sunday, November 24, 2024
Your Laravel week in review ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Lumoz RaaS Introduces Layer 2 Solution on Move Ecosystem
Sunday, November 24, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 24, 2024? The HackerNoon
😼 The hottest new AI engineer
Sunday, November 24, 2024
Plus, an uncheatable tech screen app Product Hunt Sunday, Nov 24 The Roundup This newsletter was brought to you by Countly Happy Sunday! Welcome back to another edition of The Roundup, folks. We've