Diplomacy: The AI Benchmark that Gets Us Closer to the Turing Test
Was this email forwarded to you? Sign up here 📝 EditorialA few days ago, we discussed the release of CICERO, a language model created by Meta AI that was able to master the complex game of Diplomacy. Last week, DeepMind published a paper oin the Nature journal proposing a technique for cooperation of AI agents in Diplomacy. Little by little, Diplomacy is becoming one of the most interesting benchmarks for reasoning capabilities in large language models. What makes Diplomacy so fascinating is that the game requires players to negotiate, form-betray alliances, cooperate and compete in an immensely large, seven-player action space. Differently from other game environments, Diplomacy does not rely just on moves on the board but on language interactions between the players. Computational approaches to solve Diplomacy have been tries since the 1980s but the language understanding capabilities were simply not available. Just like chess, Go and video games proved to be a fertile ground for AI renaissance of the last decade, games like Diplomacy are going to set a benchmark for a new generation of models that can collaborate with humans in really complex language tasks. A fascinating way to think about Diplomacy is as a benchmark that includes some of the key challenges of the theoretical Turing test. This is highly debatable as the Turing test is more focused on imitating human behavior than anything else. However, complex negotiation and dialog engagement is definitely a key part of it. From that perspective, solving Diplomacy is certainly a step in the right direction. For now, Meta AI and DeepMind are off to the races with Diplomacy models. 🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻 🗓 Next week in TheSequence Edge:Edge#251: Our series about ML interpretability explores the concept of global model-agnotistic interpretability methods. Edge#252: We discussed DreamFusion, Google’s new text-to-3D generative model. 🔎 ML ResearchAnother Diplomacy AI AgentDeepMind published a paper detailing an AI agent that was able to cooperate, negotiate and master the Diplomacy board game. This comes days after Meta AI unveiled CICERO, another AI agent that achieve top human performance in Diplomacy —> Read more. Data Scarcity and Generative AIResearchers from MIT published a fascinating paper highlighting the challenges of data scarcity to pretrain large language models —> Read more. The AlphaCode PaperDeepMind published the official paper behind AlphaCode, its agent that can solve competitive programming tasks —> Read more. Evaluating Input SaliencyGoogle Brain published a paper proposing a method to evaluate input salience methods —> Read more. Dexterity Training for a Robot HandNVIDIA Research published a paper detailing DeXtreme , a technique used to tech dexterity to a robot hand —> Read more. 🤖 Cool AI Tech ReleasesML for SheetsGoogle released Simple ML for Sheets, a Google Sheets extensions that allows the use of TensorFlow models —> Read more. Building Recommender Systems with TensorFlowTensorFlow published a dedicated page with resources dedicated to build recommender systems —> Read more. OpenVINO-Torch-ORT IntegrationMicrosoft and Intel open sourced an integration of OpenVINO and Torch-ORT to build faster inference models in PyTorch —> Read more. 🛠 Real World MLSummarizing Slack ContentSalesforce Research details the approach used to summarize the content of Slack channels using generative AI —> Read more. 💸 Money in AI
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Key phrases
Older messages
🚀🚀 Edge#250: Meta AI’s New Super Model: CICERO is Able to Negotiate and Cooperate with People
Thursday, December 8, 2022
CICERO combines language understanding and strategic reasoning to achieve top-human performance in the game of Diplomacy.
🔮 Edge#249: Model-Intrinsic vs. Post-Hoc Interpretability Methods
Monday, December 5, 2022
Model-intrinsic vs. post-hoc interpretability, activation atlases visualizations and TensorBoard.
What a Week for Generative AI
Sunday, December 4, 2022
📝 Editorial We just experienced one of the most active weeks of the year in the AI market. AWS came out with a lot of interesting announcements at re:Invent, PyTorch 2.0 was released and the NeurIPS
🚀🚀 Edge#248: Foundation Models are Creating the Industrial Era of AI
Thursday, December 1, 2022
Large pretrained models are changing the mechanics of intelligent applications
📃 Edge#247: Classifying ML Interpretability Methods
Tuesday, November 29, 2022
In this issue: we classify ML interpretability methods; we explore the building blocks of interpretability by Google Research; we explain TensorWatch, an open-source framework for debugging ML models.
You Might Also Like
Is the wind going out of the AI sails?
Friday, April 19, 2024
Rippling vacuums up venture capital and Ramp bags more millions View this email online in your browser By Haje Jan Kamps Friday, April 19, 2024 Image Credits: Getty Images / Carol Yepes Welcome to
Llama 3 is out - Weekly News Roundup - Issue #463
Friday, April 19, 2024
Plus: brand-new, all-electric Atlas; AI Index Report 2024; Microsoft pitched GenAI tools to US military; Humane AI Pin reviews are in; debunking Devin; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Daily Coding Problem: Problem #1417 [Easy]
Friday, April 19, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Wayfair. You are given a 2 x N board, and instructed to completely cover the board with
Charted | How Hard Is It to Get Into an Ivy League School? 🎓
Friday, April 19, 2024
We detail the admission rates and average annual cost for Ivy League schools, as well as the median SAT scores required to be accepted. View Online | Subscribe Presented by: Discover the motivations
Dark Matter & Tortured Poets
Friday, April 19, 2024
New music releases aren't what they used to be -- for good and bad. Dark Matter & Tortured Poets By MG Siegler • 19 Apr 2024 View in browser View in browser New music releases in 2024 are a
Impact of AI on Product Management
Friday, April 19, 2024
Impact of AI on Product Management The rise of the AI Product Manager. Product managers have always championed customer's needs. However, with AI, the job requires new technical and ethical
⚙️ Zuck has entered the chat(bot)
Friday, April 19, 2024
Plus: AI video's coming to mobile!
Noonification: Just Made my First Dollar With My SaaS After Quitting my Job
Friday, April 19, 2024
Top Tech Content sent at Noon! Get Algolia: AI Search that understands How are you, @newsletterest1? 🪐 What's happening in tech this week: The Noonification by HackerNoon has got you covered with
From Not to Hot 🔥7 Practices to Land a Trending Story
Friday, April 19, 2024
Discover the Insider Secrets to Elevate Your Story's Success! 🚀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
iOS Dev Weekly - Issue 657
Friday, April 19, 2024
What's the easiest and least hassle path to launching a custom app store in the EU? 🏪 View on the Web Archives ISSUE 657 April 19th 2024 Comment You probably already saw this week's