Was this email forwarded to you? Sign up here

Diplomacy: The AI Benchmark that Gets Us Closer to the Turing Test

Dec 11

Save

▷ Listen

📝 Editorial

A few days ago, we discussed the release of CICERO, a language model created by Meta AI that was able to master the complex game of Diplomacy. Last week, DeepMind published a paper oin the Nature journal proposing a technique for cooperation of AI agents in Diplomacy. Little by little, Diplomacy is becoming one of the most interesting benchmarks for reasoning capabilities in large language models.

What makes Diplomacy so fascinating is that the game requires players to negotiate, form-betray alliances, cooperate and compete in an immensely large, seven-player action space. Differently from other game environments, Diplomacy does not rely just on moves on the board but on language interactions between the players. Computational approaches to solve Diplomacy have been tries since the 1980s but the language understanding capabilities were simply not available. Just like chess, Go and video games proved to be a fertile ground for AI renaissance of the last decade, games like Diplomacy are going to set a benchmark for a new generation of models that can collaborate with humans in really complex language tasks.

A fascinating way to think about Diplomacy is as a benchmark that includes some of the key challenges of the theoretical Turing test. This is highly debatable as the Turing test is more focused on imitating human behavior than anything else. However, complex negotiation and dialog engagement is definitely a key part of it. From that perspective, solving Diplomacy is certainly a step in the right direction.

For now, Meta AI and DeepMind are off to the races with Diplomacy models.

🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻

🗓 Next week in TheSequence Edge:

Edge#251: Our series about ML interpretability explores the concept of global model-agnotistic interpretability methods.

Edge#252: We discussed DreamFusion, Google’s new text-to-3D generative model.

🔎 ML Research

Another Diplomacy AI Agent

DeepMind published a paper detailing an AI agent that was able to cooperate, negotiate and master the Diplomacy board game. This comes days after Meta AI unveiled CICERO, another AI agent that achieve top human performance in Diplomacy —> Read more.

Data Scarcity and Generative AI

Researchers from MIT published a fascinating paper highlighting the challenges of data scarcity to pretrain large language models —> Read more.

The AlphaCode Paper

DeepMind published the official paper behind AlphaCode, its agent that can solve competitive programming tasks —> Read more.

Evaluating Input Saliency

Google Brain published a paper proposing a method to evaluate input salience methods —> Read more.

Dexterity Training for a Robot Hand

NVIDIA Research published a paper detailing DeXtreme , a technique used to tech dexterity to a robot hand —> Read more.

🤖 Cool AI Tech Releases

ML for Sheets

Google released Simple ML for Sheets, a Google Sheets extensions that allows the use of TensorFlow models —> Read more.

Building Recommender Systems with TensorFlow

TensorFlow published a dedicated page with resources dedicated to build recommender systems —> Read more.

OpenVINO-Torch-ORT Integration

Microsoft and Intel open sourced an integration of OpenVINO and Torch-ORT to build faster inference models in PyTorch —> Read more.

🛠 Real World ML

Summarizing Slack Content

Salesforce Research details the approach used to summarize the content of Slack channels using generative AI —> Read more.

💸 Money in AI

Runway ML raised $50 million to expand its generative AI platform for video editing.
Twelve Labs raised $12 million to develop models that understand contextual information in videos.
Israeli AI startup NeuReality raised $35 million series A to continue working on a high performance AI inference chip.
Enterprise AI startup Protopia AI raised $6 million to expand its solution to derive insights from enterprise data sources while maintaining high levels of privacy.
Continuing the generative AI funding frenzy, SellScale announced that it raised $3.4 million to enable NLP capabilities for sales and marketing teams.
Gaia AI raised $3 million to apply AI to help with forest protection and management.
Pixel AI raised $1 million to use AI to help retailers improve their search experiences.
Akros Technologies raised $2.3 million for applying cutting edge deep learning techniques to asset management.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Like

Comment

Share

Diplomacy: The AI Benchmark that Gets Us Closer to the Turing Test

Diplomacy: The AI Benchmark that Gets Us Closer to the Turing Test

📝 Editorial

🗓 Next week in TheSequence Edge:

🔎 ML Research

Another Diplomacy AI Agent

Data Scarcity and Generative AI

The AlphaCode Paper

Evaluating Input Saliency

Dexterity Training for a Robot Hand

🤖 Cool AI Tech Releases

ML for Sheets

Building Recommender Systems with TensorFlow

OpenVINO-Torch-ORT Integration

🛠 Real World ML

Summarizing Slack Content

💸 Money in AI

Older messages

🚀🚀 Edge#250: Meta AI’s New Super Model: CICERO is Able to Negotiate and Cooperate with People

🔮 Edge#249: Model-Intrinsic vs. Post-Hoc Interpretability Methods

What a Week for Generative AI

🚀🚀 Edge#248: Foundation Models are Creating the Industrial Era of AI

📃 Edge#247: Classifying ML Interpretability Methods

You Might Also Like

Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator

Defining Your Paranoia Level: Navigating Change Without the Overkill

5 ways AI can help with taxes 🪄

Recurring Automations + Secret Updates

The First Provable AI-Proof Game: Introducing Butterfly Wings 4

GCP Newsletter #437

Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰

The Great Social Media Diaspora & Tapestry is here

Daily Coding Problem: Problem #1689 [Medium]

📧 Stop Conflating CQRS and MediatR