It should be illegal to ship that many updates and releases so close to the holidays, but here we are â two weeks before Christmas, with our hands full of news and research papers (thank you, OpenAI's 12 days of shipping and booming NeurIPs, very much!). Letâs dive in: Sora, Genie 2 by Google DeepMind, and World Labs by Fei Fei Li â it was truly a fascinating week. Be aware: a lot of videos in this newsletter! You might want to |
|
But first, a reminder: we are piecing together expert views on the trajectory of ML&AI for 2025. Send to ks@turingpost.com your thoughts on what you believe 2025 will bring! Or just reply to this email. | Many many thanks to those who already shared their views. |
|
|
Now, to the weekâs hottest topics: Sora, Genie 2 and World Labs |
Itâs not exactly trivial to get access to Sora, and there are a couple of issues: |
Lack of communication from the team: For example, OpenAI announced that Sora is included with ChatGPT Plus/Pro â it wasnât for us. And nobody from the team could immediately clarify that. Thatâs frustrating. We had to buy an additional subscription. A lot of demand created by their professional â12 days of Shipmasâ hype-making. To the point that Sam Altman had to say, âSignups will be disabled on and off, and generations will be slow for a while.â And, if you are in Europe or the UK â you simply canât get access to Sora.
|
But. |
If and when you finally get your hands on it â Sora is pretty magnificent. Itâs actually quite incredible. Once again, OpenAI beats everyone with an intuitive user experience, delivering sophisticated technology to every noob out there. In every sense of it, bringing magic to muggles. |
| Enigmatic Computing Adventure â Sora by OpenAI |
|
|
One thing Sora doesnât allow, no matter how hard you try, is generating a realistic depiction of an actual person, even historical figures. (In the video above, I attempted to create Alan Turing, of course!) Considering that competing models are likely to support this soon, itâs a disadvantage â but an understandable one, given the current legal battles around copyrights OpenAI is involved in. |
As noted in the presentation: if youâre expecting Sora to produce a feature film for you, thatâs not going to happen. But consider how far weâve come. Just two years ago, text-to-image generation was clumsy at best â ah, the nostalgia of extra fingers! Now, we have the ability to create entire video clips with intuitive storyboards, allowing you to turn text into video, incorporate your own images, and refine the result into something surprisingly polished. |
| Enveloped by Technology â Sora by OpenAI |
|
|
And even if the law of physics are still suffering, the progress is enormous. |
Now to the nerdy part: This exciting progress ties closely to the concept of spatial intelligence, which we use daily â whether itâs navigating a map, packing a suitcase, parking a car, or planning the steps of a complex recipe. Spatial intelligence aligns with the idea of âworld models,â a term introduced by David Ha and JĂźrgen Schmidhuber in their 2018 paper World Models. Since then, the discussion and development have advanced considerably. |
Two World Models from last week |
Google DeepMind introduced Genie 2, a large-scale foundation world model capable of generating diverse, action-controllable 3D environments from a single image or text prompt. Trained on extensive video datasets, Genie 2 can simulate various scenarios, including object interactions, character animations, and physical effects like gravity and lighting. Users can interact with these generated worlds in real-time using standard inputs such as a keyboard and mouse. |
| Smoke 2 by Google DeepMind Genie 2 |
|
|
This development represents a significant advancement in the creation of adaptable training grounds for AI, enabling rapid prototyping of interactive experiences and providing diverse environments for training and evaluating embodied agents. |
Similarly, World Labs, co-founded by AI pioneer Fei-Fei Li, unveiled an AI system that generates interactive 3D scenes from a single image. This system allows users to explore AI-generated scenes directly in a web browser, with the ability to move within the environment and interact with various elements. The technology adapts to different art styles and scenes, bringing the physics of real life into the virtual space. |
| World Labs Unveils AI System That Transforms Single Images into Interactive 3D Worlds |
|
|
World Labs' approach focuses on creating large world models to perceive, generate, and interact with the 3D world, aiming to democratize the creation of virtual spaces and make the process faster and more accessible. |
Diving into Genie 2 or World Labsâ system, youâll discover theyâre nothing short of revolutionary. These systems take the foundational principles of World Models and push them into uncharted territory, evolving into rich, interactive 3D environments. |
This leap â from task-specific applications to versatile, immersive systems âdemonstrates the transformative power of world models. Spatial intelligence marks a fundamental shift, breaking free from the "flat" screen paradigm to embrace the three-dimensional way our minds are naturally wired to think, explore and interact. |
The possibilities are truly thrilling. |
If you like Turing Post, consider becoming a paid subscriber or sharing this digest with a friend. It helps us keep Monday digests free â | |
|
|
|
|
| 16 New Types of Retrieval-Augmented Generation (RAG) | www.turingpost.com/p/16-new-types-of-rag |
| |
|
|
AI in Practice â Rats welcome robot-rat |
| AI infiltrates the rat world: New robot can interact socially with real lab rats |
|
|
To add to that: Almost 10% Of South Korea's Workforce Is Now A Robot |
|
We are reading â Intel on our mind (is it really dying?) |
Rene Haas highlighted Intel's struggle between vertical integration and a fabless model, citing high costs and innovation challenges. He mentioned attempting to encourage Intel to license Arm technology and acknowledged the strategic benefits of vertical integration amid rumors of Arm's interest in acquiring parts of Intel. Meanwhile, Ben Thompson argues that Intelâs decline stems from its inability to adapt to mobile and efficiency-first computing, allowing ARM and TSMC to dominate. He highlights missed opportunities, such as Intelâs refusal to embrace ARM manufacturing or prioritize power efficiency. While Pat Gelsingerâs foundry plan aimed to address these issues, it was too late to reverse Intelâs losses in AI and profitability. Thompson suggests that Intelâs revival hinges on government-backed AI initiatives, positioning it as a vital domestic foundry for U.S. technological sovereignty. Semianalysis attributes Intel's decline to decades of leadership failures, poor board decisions, and a loss of cultural and technical leadership. Firing CEO Pat Gelsinger and prioritizing financial engineering over innovation worsened the situation. Intel's delays in advanced nodes allowed competitors like TSMC and AMD to dominate. ARM-based architectures and hyperscaler custom chips further erode its market. Intel Foundry Services is seen as its last chance for relevance, requiring massive investment and government support to secure U.S. semiconductor independence. The article advocates divesting non-core businesses and focusing on revitalizing the foundry as Intel's lifeline.
|
|
Top Research â System Cards, Tech reports and Surveys: |
|
| Simon Willison @simonw | |
| |
Here's the spiciest detail from the new o1 system card: | OpenAI @OpenAI The updated OpenAI o1 system card builds on prior safety work, detailing robustness evals, red teaming insights, and safety improvements using Instruction Hierarchy. It maintains a "medium" risk rating based on testing with an expanded suite of evaluations, reflecting it is safe⌠x.com/i/web/status/1⌠|
| | | | 6:22 PM ⢠Dec 5, 2024 | | | | 5.02K Likes 451 Retweets | 91 Replies |
|
|
From 01.ai â Yi-Lightning Technical Report âread it here This technical report introduces O1-CODER, an attempt to replicate OpenAIâs o1 model with a focus on coding tasks âread the paper Also, this:
|
| Ksenia Se @Kseniase_ | |
| |
Reading about scaling laws recently I came by the interesting point: Focus on a balance between models' size and performance is more important that aiming for larger models. @Tsinghua_Uni and ModelBest Inc propose the idea of âcapacity densityâ to measure how efficiently a model⌠x.com/i/web/status/1⌠| | | | 12:23 AM ⢠Dec 9, 2024 | | | | 13 Likes 3 Retweets | 2 Replies |
|
|
âread the paper here |
Models |
| AI at Meta @AIatMeta | |
| |
As we continue to explore new post-training techniques, today we're releasing Llama 3.3 â a new open source model that delivers leading performance and quality across text-based use cases such as synthetic data generation at a fraction of the inference cost. x.com/i/web/status/1⌠| | | | 5:01 PM ⢠Dec 6, 2024 | | | | 2.98K Likes 500 Retweets | 134 Replies |
|
|
Efficient Track Anything and Segment Anything Model 2 (SAM 2)Â also from Meta AI develops EfficientTAM for real-time video object tracking on resource-constrained devices with high accuracy and efficiency âread the paper Amazon Nova Foundation Models for understanding and creative tasks, focusing on scalability, safety, multilingual support, and cost-efficiency âread the paper PaliGemma 2 from Google DeepMind advances transfer learning with Vision-Language Models optimized for tasks like OCR, molecular structure recognition, and music score transcription âread the paper. NVILA by Nvidia reduces training and inference costs while maintaining high accuracy for tasks like medical imaging and robotic navigation âread the paper
|
You can find the rest of the curated research at the end of the newsletter. |
|
News from The Usual Suspects Š |
|
| Sundar Pichai @sundarpichai | |
| Replying to@sundarpichai | We see Willow as an important step in our journey to build a useful quantum computer with practical applications in areas like drug discovery, fusion energy, battery design + more. Details here: | | blog.google/technology/res⌠Meet Willow, our state-of-the-art quantum chip Our new quantum chip demonstrates error correction and performance that paves the way to a useful, large-scale quantum computer. |
|
| | 5:06 PM ⢠Dec 9, 2024 | | | | 3.45K Likes 331 Retweets | 94 Replies |
|
|
|
| Thomas Wolf @Thom_Wolf | |
| |
Four new visualisations of the rise of open-source AI models in 2024 added! - explore how tasks have been growing - how likes connect models together - the geography of models creators and followers | | | | 8:22 AM ⢠Dec 9, 2024 | | | | 48 Likes 11 Retweets | 2 Replies |
|
|
Microsoft is seeing the big picture Microsoft's new Copilot Vision brings real-time insights to Edge browser for Pro users. Aimed at enterprise decision-makers, it turns data into actionable visuals with the click of a button. Microsoft continues weaving AI deeper into everyday workflows. OpenAI levels up with ChatGPT Pro and Reinforcement Fine-Tuning Research Program OpenAI introduces ChatGPT Pro, offering unlimited access to all models for $200/month, including the powerful GPT-4 turbocharged âo1â and expanded their RFT Program to enable developers and ML engineers to create expert models fine-tuned to excel at specific sets of complex, domain-specific tasks. AWS Reinvents AI again AWS drops the mic with cutting-edge AI updates at re:Invent 2024. Highlights include Multi-Agent Orchestration on Bedrock, the Nova AI Model Family, and Prompt Caching for big savings. Enterprises like Moody's are already reaping the benefits of AI-first workflows. Salesforce measures AIâs pulse Salesforce's Agentforce platform is delivering on its promise with soaring adoption KPIs. Enterprise AI agents are automating workflows, driving real ROI, and making humans feel slightly less indispensable. Canada gets cooler with AI Cohere and CoreWeave are teaming up to build a cutting-edge data center in Canada. The collaboration promises to accelerate AI research while keeping the great white north on the innovation map.
|
|
More interesting research papers from last week |
Vision-Language Model Enhancements |
Discriminative Fine-tuning of LVLMs Improve LVLMs by fine-tuning with contrastive and autoregressive losses, enhancing image-text discrimination and efficiency. Read the paper Florence-VL Enhance multimodal understanding using a generative vision encoder with depth-breadth fusion, excelling in OCR and visual tasks. Read the paper VLsI Optimize smaller vision-language models using verbalized intermediate layers for efficiency and improved task performance. Read the paper
|
Datasets for LLMs and Physics Simulations |
FineWeb2 Hugging Face democratizes AI research with FineWeb2, a high-quality 15T token dataset for diverse pretraining needs. Read the paper The Well Support physics-informed machine learning with diverse, high-resolution numerical simulations across domains. Read the paper
|
Model Optimization and Fine-Tuning |
Weighted-Reward Preference Optimization Fuse capabilities of heterogeneous LLMs efficiently without requiring aligned vocabularies. Read the paper TinyFusion Reduce diffusion transformer size and costs with adaptive pruning and distillation methods. Read the paper Aim Optimize multi-modal inference by pruning and merging redundant tokens for efficiency. Read the paper
|
Sparse and Multilingual Training |
Monet Enable scalable and interpretable sparse mixture-of-expert models, specializing in language and domain knowledge. Read the paper Marco-LLM Boost multilingual performance, particularly for low-resource languages, using diverse, large-scale training. Read the paper
|
Task-Specific Innovations and Scaling |
Establishing Task Scaling Laws Predict task-specific LLM performance efficiently using compute-reduced "ladder models." Read the paper Exploring Proportional Analogies Assess LLM reasoning on analogies with targeted knowledge-enhanced prompting for improved accuracy. Read the paper
|
Multi-Agent and Collaborative Training |
MALT Improve LLM reasoning by assigning collaborative roles in multi-agent setups for better task outcomes. Read the paper Free Process Rewards Without Process Labels Train process reward models efficiently using outcome labels instead of intermediate annotations. Read the paper
|
RAG and OCR Challenges |
|
Leave a review! |
|
Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription! |
|
|
|