|
This Week in Turing Post: | Wednesday, AI 101, Models: Mistral family Friday, AI Unicorns: Perplexity
| If you like Turing Post, consider upgrading, exploring this smarter way to research from our partners, or sharing this digest with a friend. It helps! |
|
|
Main topic: Small language models are one the rise |
Two things sparked my curiosity last week: the surge in papers and announcements related to small language models (SLMs) and ’s recent move to OpenAI. |
Bubeck is notable for (at least) two achievements: |
|
In his interview with Turing Post, Bubeck explained their intuition behind the approach they took in Textbooks Are All You Need: |
"Following the Sparks of AGI paper, we realized that to 'understand' what’s happening in LLMs, we had to try building our own. We had no experience training large transformers and limited data to begin with. Recognizing how hard it could be to evaluate any LLM we trained (given the maze of academic benchmarks), we decided to narrow the scope: coding was our target because of an existing large dataset (The Stack), a simple evaluation metric (OpenAI’s HumanEval), and prior evidence that ~1B parameter networks could handle this task reasonably. With only a few dozen GPUs, we aimed for a high HumanEval score using an SLM and restricted data. Filtering The Stack for 'educational content' (as identified by GPT-4) and creating 'synthetic textbooks' to diversify the data were crucial. After a month, we reached 50% on HumanEval and declared success. Then came the question: could this approach extend beyond coding? That’s when we tackled common-sense reasoning with phi-1.5 and general cognitive ability with phi-2, eventually reaching phi-3!" |
|
|
Not that long ago, it has been confirmed, that OpenAI is collaborating with designer Jony Ive to develop an AI-powered hardware device aimed at a less socially intrusive computing experience than current smartphones. This project perfectly aligns with Bubeck's vision of integrating AI models into everyday devices! |
In the same interview, Bubeck told us: |
"I can’t wait for SLMs like Phi-3 to be embedded everywhere. We’re already seeing this with Phi Silica, a derivative of Phi-3-mini, built specifically for the Copilot+ PCs announced on May 20, just before Build 2024. Windows will be the first platform to feature an in-box, state-of-the-art SLM, optimized for the NPU, by the end of this year. Eventually, I’d love to ask my watch to perform actions while I’m running or have an SLM on my phone while I hike, answering questions about what I’m seeing. The applications are endless." |
|
|
Given Bubeck's background and OpenAI's recent hardware initiatives, it’s reasonable to assume that OpenAI views SLMs as a crucial part of its strategy toward achieving AGI – or at least a major component. Bubeck’s focus at OpenAI will likely center on: |
Developing Efficient AI Models for Hardware Integration: Drawing on his SLM expertise, Bubeck may work on compact AI models optimized for OpenAI's new hardware, ensuring peak performance on devices with limited resources. Enhancing On-Device AI Capabilities: He could contribute to advancing AI features that function directly on consumer devices, decreasing reliance on cloud computing and improving user privacy. Collaborating on Custom AI Chip Development: With OpenAI’s partnerships with Broadcom and TSMC to develop custom AI chips, Bubeck's insight could help create models tailored for these chips, boosting both efficiency and performance.
|
OpenAI has no plans to slow down. With last week’s launch of SearchGPT, an AI-powered search engine that integrates real-time web information with conversational capabilities, positioning itself as a direct competitor to established search platforms like Google, and Bubeck on board with his expertise in SLM (and Sparks of AGI), OpenAI is casting an even wider net, getting their hands on the hottest topics. |
|
Other companies accelerating their SLM game: |
|
It’s also worth noting that Qualcomm's CEO Cristiano Amon said he wants to "break the paradigm of the app construct," signaling a shift from traditional apps to AI agents on your devices. And what could be more efficient for this than SLMs? |
To wrap up on SLMs for today, check out this survey of small language models from thirteen reputable universities and AI labs. They offer a taxonomy that provides a structured approach to understanding and evaluating SLMs, focusing on: |
How models are optimized (through architectural design, training efficiency, and compression). Which constraints are prioritized (e.g., compute, memory, energy) based on the intended application environment and deployment needs.
|
| Image Source: The Survey of SLMs |
|
|
|
| 10 Open Multimodal Models | Our collection of the most powerful open MLLMs to help you find the best option for your specific needs | www.turingpost.com/p/10-open-mllms |
| |
|
|
Weekly recommendation from AI practitioner👍🏼 |
Patchwork – an open-source toolset for merging and transforming datasets. Think of it as your essential toolkit for tidying up data chaos, with flexible, modular utilities designed for swift data integration across projects. |
💎 We also recommend our partners SciSpace – your research buddy) | SciSpace has got everything you need – over 280 million+ papers at your fingertips, easier lit reviews, and even AI that chats with your PDFs to break things down for you. Trusted by 3M+ active users! | |
|
|
News from The Usual Suspects © |
Breakthroughs? → |
Osmo Labs Digitized Smell “A fresh summer plum was the first fruit and scent to be fully digitized and reprinted with no human intervention.” →Alex Wiltschko’s twitter Decart & Etched introduced Oasis: A Fully AI-Generated Game World It’s the first fully playable, real-time, open-world AI game, revolutionizing gaming with AI-generated experiences →Oasis’s GitHub
|
More news → |
Waymo | EMMA: End-to-end Multimodal Driving Model, using Google’s Gemini, combines sensor and language data, excelling in trajectory prediction and object detection. Short-term memory and lack of LiDAR remain limitations →Waymo’s blog Amazon | AWS Pits Q Developer Against Copilot. Why it might be good? It’s backed by Claude 3.5 – usual programmers’ choice →Amazon’s blog Google Gemini’s “Grounding with Google Search” lets apps use live data, improving factual accuracy and trustworthiness →Google’s blog Big Sleep, Google’s AI tool, found critical flaws in SQLite, showcasing AI’s potential to detect complex vulnerabilities in software →Project Zero’s blog Google’s new “Learn About” tool turns any search query into a structured, interactive learning experience, powered by Gemini →Learn About playground
|
Policy |
A16z & Microsoft: Teaming Up for AI’s Future. They rally for a balanced AI landscape where both startups and giants can thrive. Their pitch: open-source AI, shared data pools, and policies to empower U.S. innovation – garage dreamers and corporate titans alike →Microsoft’s blog Anthropic’s Case for Targeted AI Regulation. Anthropic urges swift, focused AI regulation to curb potential risks. Their proposal: adaptive safety policies to keep up with model advancements, aiming for minimal red tape and maximum protection →Anthropic’s blog
|
AGI, again. Your thoughts: |
|
|
NVIDIA introduced HOVER (Versatile Neural Whole-Body Controller for Humanoid Robots) →their GitHub |
| HOVER Robot Group slow |
|
|
We are reading |
|
Leave a review! |
|
The freshest research papers, categorized for your convenience |
Our TOP |
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA →read the paper |
| TuringPost @TheTuringPost | |
| |
.@GoogleDeepMind, @GoogleAI and @kaist_ai introduce new methods to turn large LLMs into smaller models: - Recursive Transformers that reuse layers multiple times - Relaxed Recursive Transformers with LoRA - Continuous Depth-wise Batching for speeding up processing Details 🧵 | | | | 1:24 PM • Oct 30, 2024 | | | | 451 Likes 95 Retweets | 6 Replies |
|
|
Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models Researchers at Stellenbosch University examined strategies for reducing hallucinations in LLMs through prompt engineering and external tool integration. Testing approaches like self-consistency (SC) and Chain-of-Thought (CoT) on math and trivia tasks, they found SC best reduced hallucinations in reasoning tasks. Meanwhile, simpler prompts and avoiding tool complexity were more effective overall. Tool-using agents like ReAct increased hallucination rates, especially in less powerful LLMs, highlighting tool integration challenges →read the paper Mind Your Step (By Step): CoT Can Reduce Performance on Tasks Where Thinking Makes Humans Worse Princeton researchers identify tasks where chain-of-thought (CoT) reasoning degrades LLM performance. Testing across implicit statistical learning, visual recognition, and exception-based classification, CoT reduced accuracy by up to 36%. These reductions mirror human performance issues in similar tasks, linking specific cognitive constraints in humans to LLMs. However, CoT did not impair tasks like spatial reasoning or memory-intensive selection, highlighting cases where human-model constraints diverge →read the paper Measuring Memorization Through Probabilistic Discoverable Extraction Researchers from Google DeepMind and Boston University propose a probabilistic method to better measure memorization in LLMs. Current methods, focused on single-attempt extraction via greedy sampling, underestimate memorization. This study introduces the "(𝑛, 𝑝)-discoverable extraction" metric, capturing the probability of extracting memorized data across multiple attempts and sampling schemes →read the paper
|
Robotics & Embodied AI |
|
Language Model (LLM) Capabilities & Reasoning |
|
Optimization & Preference Tuning |
|
Memory Efficiency & Model Compression |
|
Agents & Multi-Agent Systems |
|
Surveys & Methodological Studies |
|
Training & Post-Training Optimization |
|
Interesting use cases |
|
Security & Vulnerability |
|
Retrieval & Dense Retrieval Optimization |
|
Transformers & Tokenization Innovation |
|
Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription! |
|
|
|