Anthropic raises the industry bar for intelligence - Weekly News Roundup - Issue #472
Anthropic raises the industry bar for intelligence - Weekly News Roundup - Issue #472Plus: Ilya Sutskever is back; Nvidia becomes the world's most valuable company; another company trials a humanoid robot; a military robot-dog arms race; a mad scientist grows rat neurons to play DoomHello and welcome to Weekly News Roundup Issue #472. This week, Anthropic released their latest model, Claude 3.5 Sonnet, which makes a solid attempt for the top spot in the leaderboards. In other news, Ilya Sutskever is back with a new AGI lab. Meanwhile, Nvidia has become the most valuable company in the world. Over in robotics, another company is trialling a humanoid robot, and a military robot-dog arms race between China and the US has begun. We will finish with a deep dive into how AlphaFold 3 works and a story about a mad scientist growing rat neurons to play Doom. I hope you enjoy this week’s issue! Anthropic has released a new model in their Claude family of models, Claude 3.5 Sonnet, which is also the first Claude 3.5 model released to the public. According to Anthropic, its newest model raises the industry bar for intelligence and makes a solid attempt for the top spot in the leaderboards. If the benchmark results provided by Anthropic are to be believed, Claude 3.5 Sonnet is a massive improvement over Anthropic’s previous flagship model, Claude 3 Opus, and outperforms competitors such as OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro. Additionally, Claude 3.5 Sonnet follows the recent trend of making AI models not only smarter but also faster, being two times faster than Claude 3 Opus. Anthropic has also revealed that in their internal agentic coding evaluations, Claude 3.5 Sonnet solved 64% of problems, compared to their previous flagship model, Claude 3 Opus, which solved only 38% of problems in the same test. Additionally, Anthropic reports that Claude 3.5 Sonnet is better at fixing bugs or adding new functionality to open-source projects, given a natural language description of what needs to be done. When equipped with relevant tools, Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities, says Anthropic. Claude 3.5 Sonnet is also good at translating code, which could make it useful in maintaining or migrating legacy codebases. Anthropic has not described in detail what the coding test looked like and only provided results in the Claude 3.5 Sonnet Model Card Addendum. Claude 3.5 Sonnet is also better at understanding and working with visual inputs. Anthropic claims its latest model outperforms its competitors in almost all the tests it conducted, even if by a slight margin. At the moment, Claude 3.5 Sonnet can take in only images or PDFs and cannot analyse videos. As with every first-party benchmark result, I recommend treating them with a grain of salt and seeing them as marketing material. For a better picture of how Claude 3.5 Sonnet compares to other models, I recommend checking out independent benchmarks and leaderboards, such as LMSYS Chatbot Arena or HELM (at the time I am writing this, Claude 3.5 Sonnet is not listed on those leaderboards yet). However, if Anthropic's claims are true, then Claude 3.5 Sonnet should jump close to the top of those rankings very soon. Together with Claude 3.5 Sonnet, Anthropic is also releasing Artifacts. Artifacts is a UI improvement that makes working with Claude easier. For tasks such as code generation, working with text documents, or data analysis, Claude will open a preview window next to the chat where it will display its output. This feature makes working with Claude much easier compared to ChatGPT and other chatbots. For tasks like code generation, where you have to go back and forth and guide the chatbot to generate what you want, seeing a preview next to the chat saves a lot of scrolling up and down, and makes the overall experience much more pleasant. The only issue I have with Artifacts is that after coming back to the chat from another conversation, the preview window is not open by default. It can be reopened through the Chat controls menu but I expected it to be there when I came back to the conversation. Claude 3.5 Sonnet is available for free on Claude.ai and the Claude iOS app. Claude Pro and Team plan subscribers get higher rate limits, allowing them to send 5 times more messages to Claude 3.5 Sonnet compared to those on the Free plan. It is also available via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The access to Claude 3.5 Sonnet with Anthropic API is priced competitively. Anthropic’s newest model costs $3 per million input tokens and $15 per million output tokens—much cheaper compared to the pricing for Claude 3 Opus, which is $15 per million input tokens and $75 per million output tokens (making Claude 3.5 Sonnet 3x and 5x cheaper, respectively). For comparison, OpenAI charges $5 per million input tokens and $15 per million output tokens to use GPT-4o through the OpenAI API. Meanwhile, Google asks $3.50 per million input tokens and $10.50 per million output tokens to use Gemini 1.5 Pro. Claude 3.5 Sonnet has a 200k token context window. GPT-4o’s context window is 128k tokens long, while Gemini 1.5 Pro offers a massive 1 million token context window, soon to be expanded to 2 million tokens. With Claude 3.5 Sonnet, Anthropic presents a new level of performance for mid-tier models. If the benchmark results published by Anthropic are to be believed, Claude 3.5 Sonnet is at the same level, if not better, than GPT-4o and Google Gemini 1.5 Pro. And let’s not forget that Sonnet is the mid-tier model in the Claude family. Anthropic promises to release the remaining models in the Claude 3.5 family—the light and fast Haiku, and the most powerful Opus—later this year. I will keep my eyes on Claude 3.5 Opus, as it has the potential to raise the bar for other models quite significantly, judging by how much of an improvement Claude 3.5 Sonnet is. If you enjoy this post, please click the ❤️ button or share it. Do you like my work? Consider becoming a paying subscriber to support it For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter. Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving. 🦾 More than a humanNew technique brings frozen brain tissue back to life without harm This AI-Powered Exoskeleton Could Speed Adoption by the Masses 🧠 Artificial IntelligenceIlya Sutskever Has a New Plan for Safe Superintelligence Nvidia becomes world’s most valuable company amid AI boom New AI Project Aims to Mimic the Human Neocortex Google DeepMind Shifts From Research Lab to AI Product Factory ▶️ OpenAI Stole Scarlet Johansson's Voice (20:51) Legal Eagle examines the recent controversy around OpenAI copying Scarlett Johansson’s voice (I wrote about it in detail here) from a legal point of view. The analysis explains how US intellectual property and copyright laws, as well as California’s right to publicity and previous legal cases involving various companies copying celebrities’ voices, might be useful for Johansson if she pursues legal action against OpenAI. An AI Bot Is (Sort of) Running for Mayor in Wyoming McDonald’s Terminates Its Drive-Through Ordering AI Assistant If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word? 🤖 RoboticsLet Slip the Robot Dogs of War Apollo humanoid robot in tests by Apptronik and GXO for warehouse use China: World’s first 3D e-skin gives robots human-like touching sense 🧬 Biotechnology▶️ Growing Living Neurons to Play...Doom? (28:12) One of my favourite crazy scientists on YouTube, The Thought Emporium, continues his quest to grow rat neurons and teach them to play Doom. This video focuses on getting signals from the neurons and stimulating them. AlphaFold 3 Angst: Limited Accessibility Stirs Outcry from Researchers An Opinionated AlphaFold3 Field Guide Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it. Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human. A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support! My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!" |
Older messages
What is "humanity" anyway?
Thursday, June 20, 2024
And how far can we extend the definition of "humanity"? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple Intelligence is different
Sunday, June 16, 2024
An in-depth look into Apple Intelligence and what Apple is promising with "AI for the rest of us" ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Weekly News Roundup - Issue #471
Friday, June 14, 2024
Plus: Elon Musk withdraws the lawsuit against OpenAI and Sam Altman; how nanopore sequencers were invented; a tooth-regrowing drug to be trialled in Japan; Mistral AI reaches $6B valuation; and more! ͏
Nine years
Thursday, June 13, 2024
Reflections on writing a newsletter for nine years and what are my future plans for Humanity Redefined ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Will humanoid robots take off?
Tuesday, June 11, 2024
The sci-fi dream of humanoid robots working among us seems to be just around the corner. But will it come true? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Unreleased Microsoft keyboard rolls out
Thursday, January 2, 2025
🪙 My wild Bitcoin ride; CES 2025; Security cam tips -- ZDNET ZDNET Tech Today - US January 2, 2025 Incase Compact Ergonomic Keyboard Exclusive: Incase announces new keyboard that Microsoft designed (
wpmail.me issue#700
Thursday, January 2, 2025
wpMail.me wpmail.me issue#700 - The weekly WordPress newsletter. No spam, no nonsense. - January 2, 2025 Is this email not displaying correctly? View it in your browser. News & Articles 10 Proven
Post from Syncfusion Blogs on 01/02/2025
Thursday, January 2, 2025
New blogs from Syncfusion Transform JSON into Stunning Charts: Auto-Generate Visuals with Syncfusion ® .NET MAUI Toolkit By Saravanan Madheswaran This blog explains how to auto-generate stunning charts
⚙️ Microsoft and OpenAI's AGI
Thursday, January 2, 2025
Plus: xAI's $12 billion
Are You Prepared for 2025’s Most Impactful Challenges?
Thursday, January 2, 2025
Stay ahead with insights and strategies for this year's biggest cybersecurity challenges. The Hacker News Cyber threats evolve fast, and 2025's threats will be no exception. Familiar challenges
Malicious Obfuscated NPM Package Disguised as an Ethereum Tool Deploys Quasar RAT
Thursday, January 2, 2025
THN Daily Updates Newsletter cover Full Stack Web Development ($54.99 Value) FREE for a Limited Time This book offers a comprehensive guide to full stack web development, covering everything from core
Re: This took me 10 minutes and protects my privacy
Thursday, January 2, 2025
My New Year's resolution is to do a better job of protecting my identity online. Last year, billions of personal records were compromised due to data breaches. That's why I wanted to tell you
Edge 462: What is Fast-LLM. The New Popular Framework for Pretraining your Own LLMs
Thursday, January 2, 2025
Created by ServiceNow, the framework provides the key building blocks for pretraining AI models. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The Evolution of Android Architecture Patterns
Thursday, January 2, 2025
View in browser 🔖 Articles The Evolution of Android Architecture Patterns As we welcome the New Year, dive into the transformative journey of Android architecture patterns. From MVC to MVI, explore how
🤖 Here’s How Robots are Farming Your Food — My 5 Video Game Resolutions for 2025
Wednesday, January 1, 2025
Also: Facebook Is Too Good at Suggesting Ads, and More! How-To Geek Logo January 1, 2025 Did You Know After the 1982 film ET: The Extra-Terrestrial featured Reese's Pieces prominently as a treat