Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week
Was this email forwarded to you? Sign up here Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One WeekTwo papers that open new possibilities for generative AI.Next Week in The Sequence:
You can subscribe below!📝 Editorial: Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One WeekEvery week, there is an avalanche of research papers pioneering new techniques in generative AI, but only a tiny percentage of those papers contain contributions that are truly going to push the boundaries of the space. Last week was exceptional in terms of published papers, with two that could have a remarkable impact on the next few years of generative AI.
Google DeepMind continues to challenge our imagination when it comes to generative AI. Last week, the research lab unveiled Genie, a generative model that can create a playable 2D video game from a text description, a sketch, or a photo. What makes Genie remarkable is its ability to learn fine-grained controls while being trained solely on videos. This is remarkable because videos typically don’t include labels for actions being performed on them. Genie not only learns the actions from video sequences but also variations of these actions that are applicable to the same environment. Amazing! Genie is in the super early stages, but its impact can be profound. From simulations and gaming to robotics, the ability to generate interactive environments can become one of the next frontiers for generative AI. 1-Bit LLMs Computational and memory costs are some of the biggest roadblocks to the adoption of LLMs. Techniques such as quantization can improve inference time but often sacrifice accuracy. Recently, a team of researchers from Microsoft and the University of Chinese Academy of Sciences proposed an architecture called BitNet that uses an extreme form of quantization called a 1-bit model as a way to improve cost efficiency without sacrificing performance. Last week, the team doubled down and proposed a variant of the original BitNet called BitNet b1.58, which provides additional gains in cost-effectiveness, memory, latency, and throughput. BitNet b1.58 accomplishes this by using a structure that can represent the weights and parameters of the model using only 1.58 bits instead of the typical 16-bit representation of most LLMs. The implications of BitNet b1.58 in generative AI can be quite significant. The new architecture can open the door to scaling the training and inference of LLMs using commodity hardware, and, if nothing else, the performance increases in current architectures should be notable. Both Genie and the 1-Bit LLM represent major research milestones in areas that were deemed impossible a few months ago. The pace of research in generative AI is breathtaking. Amazing times. Learn from top GenAI experts at GenAI Productionize 2024 – an industry-first summit on productionizing enterprise GenAI! We're only a week away from LinkedIn, Google, Coinbase, Roblox, Comcast, Fidelity, Procter&Gamble, Chegg, LlamaIndex and more teaching how to get GenAI apps into production, including practical strategies for governance, evaluation, and monitoring. 🔎 ML ResearchGenieGoogle DeepMind published a paper introducing generative interactive environments(Genie), a model that can generate interactive playable environments from a single image prompt. Genie was trained on a dataset of 2D games and robotic videos and the approach seems quite generalizable to otehr domains —> Read more. 1-Bit LLMsMicrosoft Research published a paper proposing BitNet b1.58, a 1-bit LLM variant that uses 1.58 bits per parameter which leads to massive saves in computational and memory requirements without sacrificing performance. Differently from traditional 16 bit models, BitNet uses a {-1, 0, 1} ternary encoding for every weight and parameter which matches full-precision of 16 bit model —> Read more. EMOAlibaba Research published a paper detailing EMO, a framework for generating expressive videos from input audio and images. EMO combines a ReferenceNet network to extract features with a diffusion model to generate the final video frames —> Read more. Finetuning and ScalingGoogle DeepMind published a paper analyzing the effectiveness of fine-tuning methods relative to the scale of LLMs. The analysis covers both the effect of data and model size in finetunning algorithms —> Read more. Generating Better Images with Hierarchical PromptsMicrosoft Research published a paper detailing a technique to enhance images created by visual language models using hierarchical prompts. The method creates detailed graphs of image decriptions which are using to generate more detailed images —> Read more. 🤖 Cool AI Tech ReleasesMistral LargeMistral announced its biggest model so far, Mistral Large, which matches the performance of GPT-4 across several benchmarks —> Read more. Le ChatMistral also unveiled Le Chat, a ChatGPT competitors built on their foundation models —> Read more. Samba-1NVIDIA competitor SambaNova released Samba-1, a one trillion parameter model optimized for enterprise scenarios —> Read more. StarCoder2BigCode released StarCoder2 , an open source code generation LLM —> Read more. 🛠 Real World MLAI-Assisted Development at PinterestPinterest dicusses lessons learned and best practices about enabling AI-assisted development processes —> Read more. AI Code Generation at GitHubGitHub shares some insights and best practices about AI code generation —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📌 You're invited to GenAI Productionize 2024
Friday, March 1, 2024
Don't miss this industry-first summit on productionizing enterprise generative AI ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 374: Some Technical Details we Learned About OpenAI's Sora
Thursday, February 29, 2024
The text-to-video model that astonished the world includes several clever engineering optimizations. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 373: Computationally Efficient LLM Reasoning with ReWOO
Tuesday, February 27, 2024
In this Issue: An overview of ReWOO as an LLM reasoning method. A review of ReWOO's research paper. An introduction to LLMFlows, a framework for building LLM applications. 💡 ML Concept of the Day:
Google Goes Small and Open Source with Gemma
Sunday, February 25, 2024
Gemma is based on the core architecture powering Gemini. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
📝 Guest Post: LoRA Land: 25 Fine-Tuned Mistral-7b LLMs that Rival or Outperform GPT-4
Friday, February 23, 2024
In this guest post, Predibase team discusses their recent release of LoRA Land that they built to demonstrate a real world example of how smaller, task-specific fine-tuned models can cost-effectively
You Might Also Like
📧 EF Core Migrations: A Detailed Guide
Saturday, May 18, 2024
EF Core Migrations: A Detailed Guide Read on: my website / Read time: 10 minutes BROUGHT TO YOU BY Low-code Framework for .NET Devs Introducing Shesha, a brand new, open-source, low-code
Slack is under attack … and you don’t want that
Friday, May 17, 2024
Plus: OpenAI is not aligned with its Superalignment team View this email online in your browser By Christine Hall Friday, May 17, 2024 Good afternoon, and welcome back to TechCrunch PM. We made it to
Ilya Sutskever leaves OpenAI - Weekly News Roundup - Issue #467
Friday, May 17, 2024
Plus: Apple is close to using ChatGPT; Microsoft builds its own LLM; China is sending a humanoid robot to space; lab-grown meat is on shelves but there is a catch; hybrid mouse/rat brains; and more! ͏
SWLW #599: Surfing through trade-offs, How to do hard things, and more.
Friday, May 17, 2024
Weekly articles & videos about people, culture and leadership: everything you need to design the org that makes the product. A weekly newsletter by Oren Ellenbogen with the best content I found
💾 There Will Never Be Another Windows XP — Why Ray Tracing is a Big Deal in Gaming
Friday, May 17, 2024
Also: What to Know About Google's Project Astra, and More! How-To Geek Logo May 17, 2024 Did You Know The very first mass-manufactured drinking straw was made of paper coated in wax; the straw was
It's the dawning of the age of AI
Friday, May 17, 2024
Plus: Musk is raging against the machine View this email online in your browser By Haje Jan Kamps Friday, May 17, 2024 Image Credits: Google Welcome to Startups Weekly — Haje's weekly recap of
Daily Coding Problem: Problem #1444 [Medium]
Friday, May 17, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Yahoo. Recall that a full binary tree is one in which each node is either a leaf node,
(Not) Sent From My iPad
Friday, May 17, 2024
The future of computing remains frustrating (Not) Sent From My iPad By MG Siegler • 17 May 2024 View in browser View in browser I tried. I really did. I tried to put together and send this newsletter
iOS Dev Weekly - Issue 661
Friday, May 17, 2024
What's the word on everyone's lips? 🅰️👁️ View on the Web Archives ISSUE 661 May 17th 2024 Comment Did you catch Google I/O this week? It's Always Interesting to see what the Android
Your Google Play recap from I/O 2024
Friday, May 17, 2024
Check out all of our latest updates and announcements Email not displaying correctly? View it online May 2024 Google Play at I/O 2024 Check out the Google Play keynote to discover the latest products