Anthropic raises the industry bar for intelligence - Weekly News Roundup - Issue #472
Anthropic raises the industry bar for intelligence - Weekly News Roundup - Issue #472Plus: Ilya Sutskever is back; Nvidia becomes the world's most valuable company; another company trials a humanoid robot; a military robot-dog arms race; a mad scientist grows rat neurons to play DoomHello and welcome to Weekly News Roundup Issue #472. This week, Anthropic released their latest model, Claude 3.5 Sonnet, which makes a solid attempt for the top spot in the leaderboards. In other news, Ilya Sutskever is back with a new AGI lab. Meanwhile, Nvidia has become the most valuable company in the world. Over in robotics, another company is trialling a humanoid robot, and a military robot-dog arms race between China and the US has begun. We will finish with a deep dive into how AlphaFold 3 works and a story about a mad scientist growing rat neurons to play Doom. I hope you enjoy this week’s issue! Anthropic has released a new model in their Claude family of models, Claude 3.5 Sonnet, which is also the first Claude 3.5 model released to the public. According to Anthropic, its newest model raises the industry bar for intelligence and makes a solid attempt for the top spot in the leaderboards. If the benchmark results provided by Anthropic are to be believed, Claude 3.5 Sonnet is a massive improvement over Anthropic’s previous flagship model, Claude 3 Opus, and outperforms competitors such as OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro. Additionally, Claude 3.5 Sonnet follows the recent trend of making AI models not only smarter but also faster, being two times faster than Claude 3 Opus. Anthropic has also revealed that in their internal agentic coding evaluations, Claude 3.5 Sonnet solved 64% of problems, compared to their previous flagship model, Claude 3 Opus, which solved only 38% of problems in the same test. Additionally, Anthropic reports that Claude 3.5 Sonnet is better at fixing bugs or adding new functionality to open-source projects, given a natural language description of what needs to be done. When equipped with relevant tools, Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities, says Anthropic. Claude 3.5 Sonnet is also good at translating code, which could make it useful in maintaining or migrating legacy codebases. Anthropic has not described in detail what the coding test looked like and only provided results in the Claude 3.5 Sonnet Model Card Addendum. Claude 3.5 Sonnet is also better at understanding and working with visual inputs. Anthropic claims its latest model outperforms its competitors in almost all the tests it conducted, even if by a slight margin. At the moment, Claude 3.5 Sonnet can take in only images or PDFs and cannot analyse videos. As with every first-party benchmark result, I recommend treating them with a grain of salt and seeing them as marketing material. For a better picture of how Claude 3.5 Sonnet compares to other models, I recommend checking out independent benchmarks and leaderboards, such as LMSYS Chatbot Arena or HELM (at the time I am writing this, Claude 3.5 Sonnet is not listed on those leaderboards yet). However, if Anthropic's claims are true, then Claude 3.5 Sonnet should jump close to the top of those rankings very soon. Together with Claude 3.5 Sonnet, Anthropic is also releasing Artifacts. Artifacts is a UI improvement that makes working with Claude easier. For tasks such as code generation, working with text documents, or data analysis, Claude will open a preview window next to the chat where it will display its output. This feature makes working with Claude much easier compared to ChatGPT and other chatbots. For tasks like code generation, where you have to go back and forth and guide the chatbot to generate what you want, seeing a preview next to the chat saves a lot of scrolling up and down, and makes the overall experience much more pleasant. The only issue I have with Artifacts is that after coming back to the chat from another conversation, the preview window is not open by default. It can be reopened through the Chat controls menu but I expected it to be there when I came back to the conversation. Claude 3.5 Sonnet is available for free on Claude.ai and the Claude iOS app. Claude Pro and Team plan subscribers get higher rate limits, allowing them to send 5 times more messages to Claude 3.5 Sonnet compared to those on the Free plan. It is also available via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The access to Claude 3.5 Sonnet with Anthropic API is priced competitively. Anthropic’s newest model costs $3 per million input tokens and $15 per million output tokens—much cheaper compared to the pricing for Claude 3 Opus, which is $15 per million input tokens and $75 per million output tokens (making Claude 3.5 Sonnet 3x and 5x cheaper, respectively). For comparison, OpenAI charges $5 per million input tokens and $15 per million output tokens to use GPT-4o through the OpenAI API. Meanwhile, Google asks $3.50 per million input tokens and $10.50 per million output tokens to use Gemini 1.5 Pro. Claude 3.5 Sonnet has a 200k token context window. GPT-4o’s context window is 128k tokens long, while Gemini 1.5 Pro offers a massive 1 million token context window, soon to be expanded to 2 million tokens. With Claude 3.5 Sonnet, Anthropic presents a new level of performance for mid-tier models. If the benchmark results published by Anthropic are to be believed, Claude 3.5 Sonnet is at the same level, if not better, than GPT-4o and Google Gemini 1.5 Pro. And let’s not forget that Sonnet is the mid-tier model in the Claude family. Anthropic promises to release the remaining models in the Claude 3.5 family—the light and fast Haiku, and the most powerful Opus—later this year. I will keep my eyes on Claude 3.5 Opus, as it has the potential to raise the bar for other models quite significantly, judging by how much of an improvement Claude 3.5 Sonnet is. If you enjoy this post, please click the ❤️ button or share it. Do you like my work? Consider becoming a paying subscriber to support it For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter. Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving. 🦾 More than a humanNew technique brings frozen brain tissue back to life without harm This AI-Powered Exoskeleton Could Speed Adoption by the Masses 🧠 Artificial IntelligenceIlya Sutskever Has a New Plan for Safe Superintelligence Nvidia becomes world’s most valuable company amid AI boom New AI Project Aims to Mimic the Human Neocortex Google DeepMind Shifts From Research Lab to AI Product Factory ▶️ OpenAI Stole Scarlet Johansson's Voice (20:51) Legal Eagle examines the recent controversy around OpenAI copying Scarlett Johansson’s voice (I wrote about it in detail here) from a legal point of view. The analysis explains how US intellectual property and copyright laws, as well as California’s right to publicity and previous legal cases involving various companies copying celebrities’ voices, might be useful for Johansson if she pursues legal action against OpenAI. An AI Bot Is (Sort of) Running for Mayor in Wyoming McDonald’s Terminates Its Drive-Through Ordering AI Assistant If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word? 🤖 RoboticsLet Slip the Robot Dogs of War Apollo humanoid robot in tests by Apptronik and GXO for warehouse use China: World’s first 3D e-skin gives robots human-like touching sense 🧬 Biotechnology▶️ Growing Living Neurons to Play...Doom? (28:12) One of my favourite crazy scientists on YouTube, The Thought Emporium, continues his quest to grow rat neurons and teach them to play Doom. This video focuses on getting signals from the neurons and stimulating them. AlphaFold 3 Angst: Limited Accessibility Stirs Outcry from Researchers An Opinionated AlphaFold3 Field Guide Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it. Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human. A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support! My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!" |
Older messages
What is "humanity" anyway?
Thursday, June 20, 2024
And how far can we extend the definition of "humanity"? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple Intelligence is different
Sunday, June 16, 2024
An in-depth look into Apple Intelligence and what Apple is promising with "AI for the rest of us" ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Weekly News Roundup - Issue #471
Friday, June 14, 2024
Plus: Elon Musk withdraws the lawsuit against OpenAI and Sam Altman; how nanopore sequencers were invented; a tooth-regrowing drug to be trialled in Japan; Mistral AI reaches $6B valuation; and more! ͏
Nine years
Thursday, June 13, 2024
Reflections on writing a newsletter for nine years and what are my future plans for Humanity Redefined ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Will humanoid robots take off?
Tuesday, June 11, 2024
The sci-fi dream of humanoid robots working among us seems to be just around the corner. But will it come true? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Is there more to your iPhone?
Monday, November 25, 2024
Have you ever wondered if there's more to your iPhone than meets the eye? Maybe you've been using it for years, but certain powerful features and settings remain hidden. That's why we'
🎉 Black Friday Early Access: 50% OFF
Monday, November 25, 2024
Black Friday discount is now live! Do you want to master Clean Architecture? Only this week, access the 50% Black Friday discount. Here's what's inside: 7+ hours of lessons .NET Aspire coming
Open Pull Request #59
Monday, November 25, 2024
LightRAG, anything-llm, llm, transformers.js and an Intro to monads for software devs ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Last chance to register: SecOps made smarter
Monday, November 25, 2024
Don't miss this opportunity to learn how gen AI can transform your security workflowsㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤㅤ elastic | Search. Observe. Protect
SRE Weekly Issue #452
Monday, November 25, 2024
View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team's Secret Training Ground. https://firehydrant.com/blog/the-hidden-
Corporate Casserole 🥘
Monday, November 25, 2024
How marketing and lobbying inspired Thanksgiving traditions. Here's a version for your browser. Hunting for the end of the long tail • November 24, 2024 Hey all, Ernie here with a classic
WP Weekly 221 - Bluesky - WP Assets on CDN, Limit Font Subsets, ACF Pro Now
Monday, November 25, 2024
Read on Website WP Weekly 221 / Bluesky Have you joined Bluesky, like many other WordPress users, a new place for an online social presence? Also in this issue: CrawlWP, Asset Management Framework,
🤳🏻 We Need More High-End Small Phones — Linux Terminal Setup Tips
Sunday, November 24, 2024
Also: Why I Switched From Google Maps to Apple Maps, and More! How-To Geek Logo November 24, 2024 Did You Know Medieval moats didn't just protect castles from invaders approaching over land, but
JSK Daily for Nov 24, 2024
Sunday, November 24, 2024
JSK Daily for Nov 24, 2024 View this email in your browser A community curated daily e-mail of JavaScript news JavaScript Certification Black Friday Offer – Up to 54% Off! Certificates.dev, the trusted
OpenAI's turbulent early years - Sync #494
Sunday, November 24, 2024
Plus: Anthropic and xAI raise billions of dollars; can a fluffy robot replace a living pet; Chinese reasoning model DeepSeek R1; robot-dog runs full marathon; a $12000 surgery to change eye colour ͏ ͏