| Next Week in Turing Post: | | Turing Post is a reader-supported publication. To have full access to our most interesting articles and investigations, become a paid subscriber → | |
|
| AI agents are on the rise, though the concept of AI agents isn't novel. Let's first discuss the established categorization of them to understand how they are currently utilized and to later establish why they are becoming even more popular now. | Simple Reflex Agents – imagine asking a chatbot, "I need help with my order." The agent searches for specific keywords ('help') it has been programmed to recognize and provides a reply. Model-Based Reflex Agents – consider Apple's Siri, Amazon's Alexa, and smart-home systems, they maintain an internal model of user preferences, interests, and behaviors based on past interactions. Goal-Based Agents – a chess-playing program decides moves based on a strategy to win the game. Utility-Based Agents – a finance app advising on investments for maximum return. Learning Agents – a bright example would be AlphaGo/AlphaZero, which learned Go and chess through self-play and reinforcement learning, becoming extraordinarily skilled without human-coded rules. Another example is non-player characters (NPCs) in a video game that adapt and learn from player actions, creating a more dynamic and challenging gameplay experience.) Hybrids of the agents mentioned above also exist. For example, an autonomous vehicle uses sensors (simple reflex), maps (model-based), destination goals (goal-based), safety and efficiency criteria (utility-based), and adapts to driving conditions and learns from user feedback (learning agent).
| So, if we are utilizing AI agents so widely already, why is everybody doubling down on them now? | Advancements in NLP, boosted by powerful foundation models, along with increased computational power, have transformed AI agents. This, combined with multimodal capabilities and better function calling, is pushing them toward 'hyperagent' status – just as the internet revolutionized text into hypertext. All these have resulted in extended research, novel infrastructure, and many more attempts for practical implementation. | Just last week, a few releases highlighted the interest in developing the AI agents ecosystem: | The Octopus v2 model, developed by Stanford University researchers, exemplifies the progress made in on-device language models. With its impressive performance and efficiency, Octopus v2 demonstrates the feasibility of deploying AI agents on edge devices, opening up possibilities for privacy-conscious and cost-effective solutions.
| Our attention also caught how many Coding AI agents have been released last week: | Cohere AI's C4AI Command R+, a 104B parameter AI model, sets new standards for coding AI agents with advanced capabilities like code rewrites, snippet generation, and multi-step tool use including Retrieval Augmented Generation (RAG). Designed specifically for developers, it supports ten languages. It’s released under a CC-BY-NC license and made the weights accessible for research use here. Anthropic launches function calling, enhancing Claude 3. You can enable it to select the right coding tool from hundreds, using external tools via APIs for complex tasks and calculations. It relies on detailed JSON tool descriptions for accurate selection. It utilizes "chain of thought" for transparent decision-making and can handle complex multi-tool scenarios, enhancing its utility for developers. Replit introduces AI tools integrated with their IDE, focusing on building Large Language Models (LLMs) for code repair. By training models with a mix of source code and relevant natural languages, they aim to create Replit-native models for more powerful developer tools. CodiumAI introduces Codiumate, an AI coding agent enhancing developer productivity by assisting in task planning and code completion. Codiumate streamlines the coding process with plan-aware auto-completion and quality tips, resulting in a significant boost in efficiency and a reduction in code errors. The SWE-agent, developed by researchers at Princeton University, transforms language models like GPT-4 into software engineering agents capable of addressing bugs and issues in real GitHub repositories.
| | A First Look at SWE-agent |
|
| Achieving a new benchmark, the SWE-agent successfully resolves 12.29% of challenges on the complete SWE-bench test set. This advancement is made possible through the introduction of Agent-Computer Interfaces (ACI), which streamline the process for the language model to interact with, analyze, and modify code within repositories. | AIDE, an AI-driven data science agent, achieves human-level performance in Kaggle competitions, outshining half of human competitors autonomously. It surpasses both traditional AutoML systems and ChatGPT (even with human help), excelling in over 60 challenges without any human input. AIDE operates through an iterative, feedback-driven approach, closely mimicking human data scientists' strategies but with greater efficiency and cost-effectiveness. | And don’t forget, the recent releases of specified agent IOS – AIOS, an LLM agent operating system designed by Rutgers University. It optimizes LLM integration by enhancing scheduling, resource allocation, and context maintenance for diverse agents. AIOS includes modules for agent scheduling, memory and storage management, and access control, significantly improving agent performance and efficiency. Open-sourced for broad access, AIOS represents a pivotal step towards creating a more cohesive and powerful ecosystem for LLM-based agents. | This evolution of AI agents opens up exciting possibilities for human-machine collaboration and the development of truly intelligent systems that can assist us in a wide range of tasks and domains. | |
| 10+ Tools for Hallucination Detection and Evaluation in Large Language Models | We share hallucination benchmarks you can use to detect and evaluate hallucinations in your large language models. | www.turingpost.com/p/llm-hallucination-benchmarks |
| |
|
| | News from The Usual Suspects © | Microsoft | Microsoft and Quantinuum have achieved a notable quantum computing milestone by developing four highly reliable logical qubits from a configuration of 30 physical qubits. This advancement has led to an 800-fold improvement in the logical error rate, a critical step towards the realization of dependable quantum computing systems. Source: Microsoft Azure Quantum Blog.
| OpenAI | | Stability AI | Stability AI releases Stable Audio 2.0, advancing generative AI audio with text-to-audio and audio-to-audio capabilities, allowing up to 3-minute track generation. Focuses on musicality, practical application, and copyright respect with licensed AudioSparx data.
| Gretel AI | Gretel AI has introduced the largest open-source Text-to-SQL dataset, hosted on Hugging Face, to enhance AI research and model training efficiency. This dataset, comprising over 105,851 records across 100 domains, aims to democratize access to data insights and facilitate the development of AI applications capable of intuitive database interactions.
| We are reading/watching: | | Banjo Obayomi @banjtheman | |
| |
I pitted 14 LLMs against each other in an actual chatbot arena, where they fought 314 intense Street Fighter matches! 🥊🤖 In the end @AnthropicAI Claude models emerged victorious, with Claude 3 Haiku topping the leader board due to its quick responses and optimal moves. | | | Apr 3, 2024 | | | | 26 Likes 5 Retweets 3 Replies |
|
| | The freshest research papers, categorized for your convenience | Enhancements in Language Model Capabilities | Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models: Introduces a method for improving LLMs' spatial reasoning by visualizing thought processes →read the paper. Many-shot Jailbreaking: Explores a novel adversarial attack strategy against LLMs, emphasizing the power of long contexts →read the paper. Advancing LLM Reasoning Generalists with Preference Trees: Discusses enhancing LLMs for complex reasoning tasks using preference trees →read the paper. Language Models as Compilers: Improves algorithmic reasoning in LLMs by simulating pseudocode execution →read the paper. ReFT: Representation Fine-tuning for Language Models: Introduces a novel method for fine-tuning LLMs on specific tasks without updating the model weights →read the paper. Training LLMs over Neurally Compressed Text: Investigates the efficiency of training LLMs on compressed text, aiming for better performance with fewer resources →read the paper.
| Multimodal and Multilingual Models | AURORA-M: The First Open Source Multilingual Language Model: Showcases an open-source multilingual LLM designed for AI safety and multilingual capabilities →read the paper. MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview, and Text: Offers a method for generating 3D scenes from multimodal inputs, enhancing the creation of diverse 3D environments →read the paper. Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward: Enhances video QA task performance through a novel framework leveraging language models →read the paper. LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model: Explores efficient multimodal interactions in foundation models, particularly focusing on smaller-scale models →read the paper. MiniGPT4-Video: Advances multimodal LLMs for video understanding, integrating visual-textual tokens for better content interpretation →read the paper.
| Language and Speech Models | WavLLM: Towards Robust and Adaptive Speech Large Language Models: Introduces a speech LLM with improved capabilities in understanding and speaker recognition →read the paper. Poro 34B and the Blessing of Multilinguality: Presents a multilingual model that enhances capabilities in Finnish, English, and code, underscoring the benefits of multilingual training →read the paper. HyperCLOVA X Technical Report: Describes a set of LLMs with a focus on the Korean language, highlighting advances in bilingual and multilingual proficiencies →read the paper.
| Novel Applications and Methods | LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models: Demonstrates the use of LLMs in designing ABR algorithms for networking, showcasing innovative applications beyond traditional tasks →read the paper. Mixture-of-Depths: Proposes a method for transformers to dynamically allocate computational resources, offering a path towards more efficient models →read the paper. ChatGLM-Math: Enhances math problem-solving in LLMs using a self-critique pipeline, improving performance on challenging datasets →read the paper. AutoWebGLM: Develops a web navigation agent that outperforms existing models, emphasizing the role of LLMs in automating complex web interactions →read the paper. CodeEditorBench: Introduces a framework for evaluating LLMs in code editing tasks, reflecting real-world development challenges →read the paper. RALL-E: Enhances text-to-speech synthesis through CoT prompting, addressing challenges in prosody prediction for improved TTS quality →read the paper.
| Security, Ethics, and Training Enhancements | Noise-Aware Training of Layout-Aware Language Models: Focuses on efficient training methods for extracting information from visually rich documents with minimal labeled data →read the paper. Bigger is not Always Better: Examines the scaling properties of Latent Diffusion Models, suggesting that smaller models can be more efficient and effective →read the paper.
| If you decide to become a Premium subscriber, remember, that in most cases, you can expense this subscription through your company! Join our community of forward-thinking professionals. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. 🤍 Thank you for reading |
|
| | How was today's FOD?Please give us some constructive feedback | | | |
|