Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week
Was this email forwarded to you? Sign up here Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One WeekTwo papers that open new possibilities for generative AI.Next Week in The Sequence:
You can subscribe below!📝 Editorial: Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One WeekEvery week, there is an avalanche of research papers pioneering new techniques in generative AI, but only a tiny percentage of those papers contain contributions that are truly going to push the boundaries of the space. Last week was exceptional in terms of published papers, with two that could have a remarkable impact on the next few years of generative AI.
Google DeepMind continues to challenge our imagination when it comes to generative AI. Last week, the research lab unveiled Genie, a generative model that can create a playable 2D video game from a text description, a sketch, or a photo. What makes Genie remarkable is its ability to learn fine-grained controls while being trained solely on videos. This is remarkable because videos typically don’t include labels for actions being performed on them. Genie not only learns the actions from video sequences but also variations of these actions that are applicable to the same environment. Amazing! Genie is in the super early stages, but its impact can be profound. From simulations and gaming to robotics, the ability to generate interactive environments can become one of the next frontiers for generative AI. 1-Bit LLMs Computational and memory costs are some of the biggest roadblocks to the adoption of LLMs. Techniques such as quantization can improve inference time but often sacrifice accuracy. Recently, a team of researchers from Microsoft and the University of Chinese Academy of Sciences proposed an architecture called BitNet that uses an extreme form of quantization called a 1-bit model as a way to improve cost efficiency without sacrificing performance. Last week, the team doubled down and proposed a variant of the original BitNet called BitNet b1.58, which provides additional gains in cost-effectiveness, memory, latency, and throughput. BitNet b1.58 accomplishes this by using a structure that can represent the weights and parameters of the model using only 1.58 bits instead of the typical 16-bit representation of most LLMs. The implications of BitNet b1.58 in generative AI can be quite significant. The new architecture can open the door to scaling the training and inference of LLMs using commodity hardware, and, if nothing else, the performance increases in current architectures should be notable. Both Genie and the 1-Bit LLM represent major research milestones in areas that were deemed impossible a few months ago. The pace of research in generative AI is breathtaking. Amazing times. Learn from top GenAI experts at GenAI Productionize 2024 – an industry-first summit on productionizing enterprise GenAI! We're only a week away from LinkedIn, Google, Coinbase, Roblox, Comcast, Fidelity, Procter&Gamble, Chegg, LlamaIndex and more teaching how to get GenAI apps into production, including practical strategies for governance, evaluation, and monitoring. 🔎 ML ResearchGenieGoogle DeepMind published a paper introducing generative interactive environments(Genie), a model that can generate interactive playable environments from a single image prompt. Genie was trained on a dataset of 2D games and robotic videos and the approach seems quite generalizable to otehr domains —> Read more. 1-Bit LLMsMicrosoft Research published a paper proposing BitNet b1.58, a 1-bit LLM variant that uses 1.58 bits per parameter which leads to massive saves in computational and memory requirements without sacrificing performance. Differently from traditional 16 bit models, BitNet uses a {-1, 0, 1} ternary encoding for every weight and parameter which matches full-precision of 16 bit model —> Read more. EMOAlibaba Research published a paper detailing EMO, a framework for generating expressive videos from input audio and images. EMO combines a ReferenceNet network to extract features with a diffusion model to generate the final video frames —> Read more. Finetuning and ScalingGoogle DeepMind published a paper analyzing the effectiveness of fine-tuning methods relative to the scale of LLMs. The analysis covers both the effect of data and model size in finetunning algorithms —> Read more. Generating Better Images with Hierarchical PromptsMicrosoft Research published a paper detailing a technique to enhance images created by visual language models using hierarchical prompts. The method creates detailed graphs of image decriptions which are using to generate more detailed images —> Read more. 🤖 Cool AI Tech ReleasesMistral LargeMistral announced its biggest model so far, Mistral Large, which matches the performance of GPT-4 across several benchmarks —> Read more. Le ChatMistral also unveiled Le Chat, a ChatGPT competitors built on their foundation models —> Read more. Samba-1NVIDIA competitor SambaNova released Samba-1, a one trillion parameter model optimized for enterprise scenarios —> Read more. StarCoder2BigCode released StarCoder2 , an open source code generation LLM —> Read more. 🛠 Real World MLAI-Assisted Development at PinterestPinterest dicusses lessons learned and best practices about enabling AI-assisted development processes —> Read more. AI Code Generation at GitHubGitHub shares some insights and best practices about AI code generation —> Read more. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
📌 You're invited to GenAI Productionize 2024
Friday, March 1, 2024
Don't miss this industry-first summit on productionizing enterprise generative AI ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 374: Some Technical Details we Learned About OpenAI's Sora
Thursday, February 29, 2024
The text-to-video model that astonished the world includes several clever engineering optimizations. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 373: Computationally Efficient LLM Reasoning with ReWOO
Tuesday, February 27, 2024
In this Issue: An overview of ReWOO as an LLM reasoning method. A review of ReWOO's research paper. An introduction to LLMFlows, a framework for building LLM applications. 💡 ML Concept of the Day:
Google Goes Small and Open Source with Gemma
Sunday, February 25, 2024
Gemma is based on the core architecture powering Gemini. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
📝 Guest Post: LoRA Land: 25 Fine-Tuned Mistral-7b LLMs that Rival or Outperform GPT-4
Friday, February 23, 2024
In this guest post, Predibase team discusses their recent release of LoRA Land that they built to demonstrate a real world example of how smaller, task-specific fine-tuned models can cost-effectively
You Might Also Like
Re: Hackers may have stolen everyone's SSN!
Saturday, November 23, 2024
I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for
North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedIn
Saturday, November 23, 2024
THN Daily Updates Newsletter cover Generative AI For Dummies ($18.00 Value) FREE for a Limited Time Generate a personal assistant with generative AI Download Now Sponsored LATEST NEWS Nov 23, 2024
📧 Building Async APIs in ASP.NET Core - The Right Way
Saturday, November 23, 2024
Building Async APIs in ASP .NET Core - The Right Way Read on: my website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a
WebAIM November 2024 Newsletter
Friday, November 22, 2024
WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to
➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux
Friday, November 22, 2024
Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and
JSK Daily for Nov 22, 2024
Friday, November 22, 2024
JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen
Friday, November 22, 2024
The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on
Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰
Friday, November 22, 2024
This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED
Daily Coding Problem: Problem #1616 [Easy]
Friday, November 22, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will
The problem to solve
Friday, November 22, 2024
Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights