Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week
Was this email forwarded to you? Sign up here Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One WeekTwo papers that open new possibilities for generative AI.Next Week in The Sequence:
You can subscribe below!📝 Editorial: Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One WeekEvery week, there is an avalanche of research papers pioneering new techniques in generative AI, but only a tiny percentage of those papers contain contributions that are truly going to push the boundaries of the space. Last week was exceptional in terms of published papers, with two that could have a remarkable impact on the next few years of generative AI.
Google DeepMind continues to challenge our imagination when it comes to generative AI. Last week, the research lab unveiled Genie, a generative model that can create a playable 2D video game from a text description, a sketch, or a photo. What makes Genie remarkable is its ability to learn fine-grained controls while being trained solely on videos. This is remarkable because videos typically don’t include labels for actions being performed on them. Genie not only learns the actions from video sequences but also variations of these actions that are applicable to the same environment. Amazing! Genie is in the super early stages, but its impact can be profound. From simulations and gaming to robotics, the ability to generate interactive environments can become one of the next frontiers for generative AI. 1-Bit LLMs Computational and memory costs are some of the biggest roadblocks to the adoption of LLMs. Techniques such as quantization can improve inference time but often sacrifice accuracy. Recently, a team of researchers from Microsoft and the University of Chinese Academy of Sciences proposed an architecture called BitNet that uses an extreme form of quantization called a 1-bit model as a way to improve cost efficiency without sacrificing performance. Last week, the team doubled down and proposed a variant of the original BitNet called BitNet b1.58, which provides additional gains in cost-effectiveness, memory, latency, and throughput. BitNet b1.58 accomplishes this by using a structure that can represent the weights and parameters of the model using only 1.58 bits instead of the typical 16-bit representation of most LLMs. The implications of BitNet b1.58 in generative AI can be quite significant. The new architecture can open the door to scaling the training and inference of LLMs using commodity hardware, and, if nothing else, the performance increases in current architectures should be notable. Both Genie and the 1-Bit LLM represent major research milestones in areas that were deemed impossible a few months ago. The pace of research in generative AI is breathtaking. Amazing times. Learn from top GenAI experts at GenAI Productionize 2024 – an industry-first summit on productionizing enterprise GenAI! We're only a week away from LinkedIn, Google, Coinbase, Roblox, Comcast, Fidelity, Procter&Gamble, Chegg, LlamaIndex and more teaching how to get GenAI apps into production, including practical strategies for governance, evaluation, and monitoring. 🔎 ML ResearchGenieGoogle DeepMind published a paper introducing generative interactive environments(Genie), a model that can generate interactive playable environments from a single image prompt. Genie was trained on a dataset of 2D games and robotic videos and the approach seems quite generalizable to otehr domains —> Read more. 1-Bit LLMsMicrosoft Research published a paper proposing BitNet b1.58, a 1-bit LLM variant that uses 1.58 bits per parameter which leads to massive saves in computational and memory requirements without sacrificing performance. Differently from traditional 16 bit models, BitNet uses a {-1, 0, 1} ternary encoding for every weight and parameter which matches full-precision of 16 bit model —> Read more. EMOAlibaba Research published a paper detailing EMO, a framework for generating expressive videos from input audio and images. EMO combines a ReferenceNet network to extract features with a diffusion model to generate the final video frames —> Read more. Finetuning and ScalingGoogle DeepMind published a paper analyzing the effectiveness of fine-tuning methods relative to the scale of LLMs. The analysis covers both the effect of data and model size in finetunning algorithms —> Read more. Generating Better Images with Hierarchical PromptsMicrosoft Research published a paper detailing a technique to enhance images created by visual language models using hierarchical prompts. The method creates detailed graphs of image decriptions which are using to generate more detailed images —> Read more. 🤖 Cool AI Tech ReleasesMistral LargeMistral announced its biggest model so far, Mistral Large, which matches the performance of GPT-4 across several benchmarks —> Read more. Le ChatMistral also unveiled Le Chat, a ChatGPT competitors built on their foundation models —> Read more. Samba-1NVIDIA competitor SambaNova released Samba-1, a one trillion parameter model optimized for enterprise scenarios —> Read more. StarCoder2BigCode released StarCoder2 , an open source code generation LLM —> Read more. 🛠 Real World MLAI-Assisted Development at PinterestPinterest dicusses lessons learned and best practices about enabling AI-assisted development processes —> Read more. AI Code Generation at GitHubGitHub shares some insights and best practices about AI code generation —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📌 You're invited to GenAI Productionize 2024
Friday, March 1, 2024
Don't miss this industry-first summit on productionizing enterprise generative AI ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 374: Some Technical Details we Learned About OpenAI's Sora
Thursday, February 29, 2024
The text-to-video model that astonished the world includes several clever engineering optimizations. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 373: Computationally Efficient LLM Reasoning with ReWOO
Tuesday, February 27, 2024
In this Issue: An overview of ReWOO as an LLM reasoning method. A review of ReWOO's research paper. An introduction to LLMFlows, a framework for building LLM applications. 💡 ML Concept of the Day:
Google Goes Small and Open Source with Gemma
Sunday, February 25, 2024
Gemma is based on the core architecture powering Gemini. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
📝 Guest Post: LoRA Land: 25 Fine-Tuned Mistral-7b LLMs that Rival or Outperform GPT-4
Friday, February 23, 2024
In this guest post, Predibase team discusses their recent release of LoRA Land that they built to demonstrate a real world example of how smaller, task-specific fine-tuned models can cost-effectively
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your