Diplomacy: The AI Benchmark that Gets Us Closer to the Turing Test
Was this email forwarded to you? Sign up here 📝 EditorialA few days ago, we discussed the release of CICERO, a language model created by Meta AI that was able to master the complex game of Diplomacy. Last week, DeepMind published a paper oin the Nature journal proposing a technique for cooperation of AI agents in Diplomacy. Little by little, Diplomacy is becoming one of the most interesting benchmarks for reasoning capabilities in large language models. What makes Diplomacy so fascinating is that the game requires players to negotiate, form-betray alliances, cooperate and compete in an immensely large, seven-player action space. Differently from other game environments, Diplomacy does not rely just on moves on the board but on language interactions between the players. Computational approaches to solve Diplomacy have been tries since the 1980s but the language understanding capabilities were simply not available. Just like chess, Go and video games proved to be a fertile ground for AI renaissance of the last decade, games like Diplomacy are going to set a benchmark for a new generation of models that can collaborate with humans in really complex language tasks. A fascinating way to think about Diplomacy is as a benchmark that includes some of the key challenges of the theoretical Turing test. This is highly debatable as the Turing test is more focused on imitating human behavior than anything else. However, complex negotiation and dialog engagement is definitely a key part of it. From that perspective, solving Diplomacy is certainly a step in the right direction. For now, Meta AI and DeepMind are off to the races with Diplomacy models. 🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻 🗓 Next week in TheSequence Edge:Edge#251: Our series about ML interpretability explores the concept of global model-agnotistic interpretability methods. Edge#252: We discussed DreamFusion, Google’s new text-to-3D generative model. 🔎 ML ResearchAnother Diplomacy AI AgentDeepMind published a paper detailing an AI agent that was able to cooperate, negotiate and master the Diplomacy board game. This comes days after Meta AI unveiled CICERO, another AI agent that achieve top human performance in Diplomacy —> Read more. Data Scarcity and Generative AIResearchers from MIT published a fascinating paper highlighting the challenges of data scarcity to pretrain large language models —> Read more. The AlphaCode PaperDeepMind published the official paper behind AlphaCode, its agent that can solve competitive programming tasks —> Read more. Evaluating Input SaliencyGoogle Brain published a paper proposing a method to evaluate input salience methods —> Read more. Dexterity Training for a Robot HandNVIDIA Research published a paper detailing DeXtreme , a technique used to tech dexterity to a robot hand —> Read more. 🤖 Cool AI Tech ReleasesML for SheetsGoogle released Simple ML for Sheets, a Google Sheets extensions that allows the use of TensorFlow models —> Read more. Building Recommender Systems with TensorFlowTensorFlow published a dedicated page with resources dedicated to build recommender systems —> Read more. OpenVINO-Torch-ORT IntegrationMicrosoft and Intel open sourced an integration of OpenVINO and Torch-ORT to build faster inference models in PyTorch —> Read more. 🛠 Real World MLSummarizing Slack ContentSalesforce Research details the approach used to summarize the content of Slack channels using generative AI —> Read more. 💸 Money in AI
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
🚀🚀 Edge#250: Meta AI’s New Super Model: CICERO is Able to Negotiate and Cooperate with People
Thursday, December 8, 2022
CICERO combines language understanding and strategic reasoning to achieve top-human performance in the game of Diplomacy.
🔮 Edge#249: Model-Intrinsic vs. Post-Hoc Interpretability Methods
Monday, December 5, 2022
Model-intrinsic vs. post-hoc interpretability, activation atlases visualizations and TensorBoard.
What a Week for Generative AI
Sunday, December 4, 2022
📝 Editorial We just experienced one of the most active weeks of the year in the AI market. AWS came out with a lot of interesting announcements at re:Invent, PyTorch 2.0 was released and the NeurIPS
🚀🚀 Edge#248: Foundation Models are Creating the Industrial Era of AI
Thursday, December 1, 2022
Large pretrained models are changing the mechanics of intelligent applications
📃 Edge#247: Classifying ML Interpretability Methods
Tuesday, November 29, 2022
In this issue: we classify ML interpretability methods; we explore the building blocks of interpretability by Google Research; we explain TensorWatch, an open-source framework for debugging ML models.
You Might Also Like
🎉 Black Friday Early Access: 50% OFF
Monday, November 25, 2024
Black Friday discount is now live! Do you want to master Clean Architecture? Only this week, access the 50% Black Friday discount. Here's what's inside: 7+ hours of lessons .NET Aspire coming
Open Pull Request #59
Monday, November 25, 2024
LightRAG, anything-llm, llm, transformers.js and an Intro to monads for software devs ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Last chance to register: SecOps made smarter
Monday, November 25, 2024
Don't miss this opportunity to learn how gen AI can transform your security workflowsㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤ elastic | Search. Observe. Protect
SRE Weekly Issue #452
Monday, November 25, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-
Corporate Casserole 🥘
Monday, November 25, 2024
How marketing and lobbying inspired Thanksgiving traditions. Here's a version for your browser. Hunting for the end of the long tail • November 24, 2024 Hey all, Ernie here with a classic
WP Weekly 221 - Bluesky - WP Assets on CDN, Limit Font Subsets, ACF Pro Now
Monday, November 25, 2024
Read on Website WP Weekly 221 / Bluesky Have you joined Bluesky, like many other WordPress users, a new place for an online social presence? Also in this issue: CrawlWP, Asset Management Framework,
🤳🏻 We Need More High-End Small Phones — Linux Terminal Setup Tips
Sunday, November 24, 2024
Also: Why I Switched From Google Maps to Apple Maps, and More! How-To Geek Logo November 24, 2024 Did You Know Medieval moats didn't just protect castles from invaders approaching over land, but
JSK Daily for Nov 24, 2024
Sunday, November 24, 2024
JSK Daily for Nov 24, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
OpenAI's turbulent early years - Sync #494
Sunday, November 24, 2024
Plus: Anthropic and xAI raise billions of dollars; can a fluffy robot replace a living pet; Chinese reasoning model DeepSeek R1; robot-dog runs full marathon; a $12000 surgery to change eye colour ͏ ͏
Daily Coding Problem: Problem #1618 [Easy]
Sunday, November 24, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Zillow. Let's define a "sevenish" number to be one which is either a power