Diplomacy: The AI Benchmark that Gets Us Closer to the Turing Test
Was this email forwarded to you? Sign up here 📝 EditorialA few days ago, we discussed the release of CICERO, a language model created by Meta AI that was able to master the complex game of Diplomacy. Last week, DeepMind published a paper oin the Nature journal proposing a technique for cooperation of AI agents in Diplomacy. Little by little, Diplomacy is becoming one of the most interesting benchmarks for reasoning capabilities in large language models. What makes Diplomacy so fascinating is that the game requires players to negotiate, form-betray alliances, cooperate and compete in an immensely large, seven-player action space. Differently from other game environments, Diplomacy does not rely just on moves on the board but on language interactions between the players. Computational approaches to solve Diplomacy have been tries since the 1980s but the language understanding capabilities were simply not available. Just like chess, Go and video games proved to be a fertile ground for AI renaissance of the last decade, games like Diplomacy are going to set a benchmark for a new generation of models that can collaborate with humans in really complex language tasks. A fascinating way to think about Diplomacy is as a benchmark that includes some of the key challenges of the theoretical Turing test. This is highly debatable as the Turing test is more focused on imitating human behavior than anything else. However, complex negotiation and dialog engagement is definitely a key part of it. From that perspective, solving Diplomacy is certainly a step in the right direction. For now, Meta AI and DeepMind are off to the races with Diplomacy models. 🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻 🗓 Next week in TheSequence Edge:Edge#251: Our series about ML interpretability explores the concept of global model-agnotistic interpretability methods. Edge#252: We discussed DreamFusion, Google’s new text-to-3D generative model. 🔎 ML ResearchAnother Diplomacy AI AgentDeepMind published a paper detailing an AI agent that was able to cooperate, negotiate and master the Diplomacy board game. This comes days after Meta AI unveiled CICERO, another AI agent that achieve top human performance in Diplomacy —> Read more. Data Scarcity and Generative AIResearchers from MIT published a fascinating paper highlighting the challenges of data scarcity to pretrain large language models —> Read more. The AlphaCode PaperDeepMind published the official paper behind AlphaCode, its agent that can solve competitive programming tasks —> Read more. Evaluating Input SaliencyGoogle Brain published a paper proposing a method to evaluate input salience methods —> Read more. Dexterity Training for a Robot HandNVIDIA Research published a paper detailing DeXtreme , a technique used to tech dexterity to a robot hand —> Read more. 🤖 Cool AI Tech ReleasesML for SheetsGoogle released Simple ML for Sheets, a Google Sheets extensions that allows the use of TensorFlow models —> Read more. Building Recommender Systems with TensorFlowTensorFlow published a dedicated page with resources dedicated to build recommender systems —> Read more. OpenVINO-Torch-ORT IntegrationMicrosoft and Intel open sourced an integration of OpenVINO and Torch-ORT to build faster inference models in PyTorch —> Read more. 🛠 Real World MLSummarizing Slack ContentSalesforce Research details the approach used to summarize the content of Slack channels using generative AI —> Read more. 💸 Money in AI
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
🚀🚀 Edge#250: Meta AI’s New Super Model: CICERO is Able to Negotiate and Cooperate with People
Thursday, December 8, 2022
CICERO combines language understanding and strategic reasoning to achieve top-human performance in the game of Diplomacy.
🔮 Edge#249: Model-Intrinsic vs. Post-Hoc Interpretability Methods
Monday, December 5, 2022
Model-intrinsic vs. post-hoc interpretability, activation atlases visualizations and TensorBoard.
What a Week for Generative AI
Sunday, December 4, 2022
📝 Editorial We just experienced one of the most active weeks of the year in the AI market. AWS came out with a lot of interesting announcements at re:Invent, PyTorch 2.0 was released and the NeurIPS
🚀🚀 Edge#248: Foundation Models are Creating the Industrial Era of AI
Thursday, December 1, 2022
Large pretrained models are changing the mechanics of intelligent applications
📃 Edge#247: Classifying ML Interpretability Methods
Tuesday, November 29, 2022
In this issue: we classify ML interpretability methods; we explore the building blocks of interpretability by Google Research; we explain TensorWatch, an open-source framework for debugging ML models.
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your