Anthropic raises the industry bar for intelligence - Weekly News Roundup - Issue #472
Anthropic raises the industry bar for intelligence - Weekly News Roundup - Issue #472Plus: Ilya Sutskever is back; Nvidia becomes the world's most valuable company; another company trials a humanoid robot; a military robot-dog arms race; a mad scientist grows rat neurons to play DoomHello and welcome to Weekly News Roundup Issue #472. This week, Anthropic released their latest model, Claude 3.5 Sonnet, which makes a solid attempt for the top spot in the leaderboards. In other news, Ilya Sutskever is back with a new AGI lab. Meanwhile, Nvidia has become the most valuable company in the world. Over in robotics, another company is trialling a humanoid robot, and a military robot-dog arms race between China and the US has begun. We will finish with a deep dive into how AlphaFold 3 works and a story about a mad scientist growing rat neurons to play Doom. I hope you enjoy this week’s issue! Anthropic has released a new model in their Claude family of models, Claude 3.5 Sonnet, which is also the first Claude 3.5 model released to the public. According to Anthropic, its newest model raises the industry bar for intelligence and makes a solid attempt for the top spot in the leaderboards. If the benchmark results provided by Anthropic are to be believed, Claude 3.5 Sonnet is a massive improvement over Anthropic’s previous flagship model, Claude 3 Opus, and outperforms competitors such as OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro. Additionally, Claude 3.5 Sonnet follows the recent trend of making AI models not only smarter but also faster, being two times faster than Claude 3 Opus. Anthropic has also revealed that in their internal agentic coding evaluations, Claude 3.5 Sonnet solved 64% of problems, compared to their previous flagship model, Claude 3 Opus, which solved only 38% of problems in the same test. Additionally, Anthropic reports that Claude 3.5 Sonnet is better at fixing bugs or adding new functionality to open-source projects, given a natural language description of what needs to be done. When equipped with relevant tools, Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities, says Anthropic. Claude 3.5 Sonnet is also good at translating code, which could make it useful in maintaining or migrating legacy codebases. Anthropic has not described in detail what the coding test looked like and only provided results in the Claude 3.5 Sonnet Model Card Addendum. Claude 3.5 Sonnet is also better at understanding and working with visual inputs. Anthropic claims its latest model outperforms its competitors in almost all the tests it conducted, even if by a slight margin. At the moment, Claude 3.5 Sonnet can take in only images or PDFs and cannot analyse videos. As with every first-party benchmark result, I recommend treating them with a grain of salt and seeing them as marketing material. For a better picture of how Claude 3.5 Sonnet compares to other models, I recommend checking out independent benchmarks and leaderboards, such as LMSYS Chatbot Arena or HELM (at the time I am writing this, Claude 3.5 Sonnet is not listed on those leaderboards yet). However, if Anthropic's claims are true, then Claude 3.5 Sonnet should jump close to the top of those rankings very soon. Together with Claude 3.5 Sonnet, Anthropic is also releasing Artifacts. Artifacts is a UI improvement that makes working with Claude easier. For tasks such as code generation, working with text documents, or data analysis, Claude will open a preview window next to the chat where it will display its output. This feature makes working with Claude much easier compared to ChatGPT and other chatbots. For tasks like code generation, where you have to go back and forth and guide the chatbot to generate what you want, seeing a preview next to the chat saves a lot of scrolling up and down, and makes the overall experience much more pleasant. The only issue I have with Artifacts is that after coming back to the chat from another conversation, the preview window is not open by default. It can be reopened through the Chat controls menu but I expected it to be there when I came back to the conversation. Claude 3.5 Sonnet is available for free on Claude.ai and the Claude iOS app. Claude Pro and Team plan subscribers get higher rate limits, allowing them to send 5 times more messages to Claude 3.5 Sonnet compared to those on the Free plan. It is also available via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The access to Claude 3.5 Sonnet with Anthropic API is priced competitively. Anthropic’s newest model costs $3 per million input tokens and $15 per million output tokens—much cheaper compared to the pricing for Claude 3 Opus, which is $15 per million input tokens and $75 per million output tokens (making Claude 3.5 Sonnet 3x and 5x cheaper, respectively). For comparison, OpenAI charges $5 per million input tokens and $15 per million output tokens to use GPT-4o through the OpenAI API. Meanwhile, Google asks $3.50 per million input tokens and $10.50 per million output tokens to use Gemini 1.5 Pro. Claude 3.5 Sonnet has a 200k token context window. GPT-4o’s context window is 128k tokens long, while Gemini 1.5 Pro offers a massive 1 million token context window, soon to be expanded to 2 million tokens. With Claude 3.5 Sonnet, Anthropic presents a new level of performance for mid-tier models. If the benchmark results published by Anthropic are to be believed, Claude 3.5 Sonnet is at the same level, if not better, than GPT-4o and Google Gemini 1.5 Pro. And let’s not forget that Sonnet is the mid-tier model in the Claude family. Anthropic promises to release the remaining models in the Claude 3.5 family—the light and fast Haiku, and the most powerful Opus—later this year. I will keep my eyes on Claude 3.5 Opus, as it has the potential to raise the bar for other models quite significantly, judging by how much of an improvement Claude 3.5 Sonnet is. If you enjoy this post, please click the ❤️ button or share it. Do you like my work? Consider becoming a paying subscriber to support it For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter. Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving. 🦾 More than a humanNew technique brings frozen brain tissue back to life without harm This AI-Powered Exoskeleton Could Speed Adoption by the Masses 🧠 Artificial IntelligenceIlya Sutskever Has a New Plan for Safe Superintelligence Nvidia becomes world’s most valuable company amid AI boom New AI Project Aims to Mimic the Human Neocortex Google DeepMind Shifts From Research Lab to AI Product Factory ▶️ OpenAI Stole Scarlet Johansson's Voice (20:51) Legal Eagle examines the recent controversy around OpenAI copying Scarlett Johansson’s voice (I wrote about it in detail here) from a legal point of view. The analysis explains how US intellectual property and copyright laws, as well as California’s right to publicity and previous legal cases involving various companies copying celebrities’ voices, might be useful for Johansson if she pursues legal action against OpenAI. An AI Bot Is (Sort of) Running for Mayor in Wyoming McDonald’s Terminates Its Drive-Through Ordering AI Assistant If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word? 🤖 RoboticsLet Slip the Robot Dogs of War Apollo humanoid robot in tests by Apptronik and GXO for warehouse use China: World’s first 3D e-skin gives robots human-like touching sense 🧬 Biotechnology▶️ Growing Living Neurons to Play...Doom? (28:12) One of my favourite crazy scientists on YouTube, The Thought Emporium, continues his quest to grow rat neurons and teach them to play Doom. This video focuses on getting signals from the neurons and stimulating them. AlphaFold 3 Angst: Limited Accessibility Stirs Outcry from Researchers An Opinionated AlphaFold3 Field Guide Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it. Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human. A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support! My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!" |
Older messages
What is "humanity" anyway?
Thursday, June 20, 2024
And how far can we extend the definition of "humanity"? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple Intelligence is different
Sunday, June 16, 2024
An in-depth look into Apple Intelligence and what Apple is promising with "AI for the rest of us" ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Weekly News Roundup - Issue #471
Friday, June 14, 2024
Plus: Elon Musk withdraws the lawsuit against OpenAI and Sam Altman; how nanopore sequencers were invented; a tooth-regrowing drug to be trialled in Japan; Mistral AI reaches $6B valuation; and more! ͏
Nine years
Thursday, June 13, 2024
Reflections on writing a newsletter for nine years and what are my future plans for Humanity Redefined ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Will humanoid robots take off?
Tuesday, June 11, 2024
The sci-fi dream of humanoid robots working among us seems to be just around the corner. But will it come true? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Aspire Deployment: Course Updates (coming soon)
Wednesday, October 23, 2024
Hey, it's Milan. Just wanted to share something I'm working on as we're getting closer to the .NET 9 release. I'm working on a brand new chapter for my courses about integrating .NET
📟 Turning Old Tech Into Keychains — How to Use Android's Theft Protection Feature
Tuesday, October 22, 2024
Also: Modern Video Games Are Too Easy, and More! How-To Geek Logo October 22, 2024 Did You Know When Galoob released the "Game Genie" product in the 1990s to allow players on the Nintendo
Unlock Python's Pattern Matching, Combinatoric Iterators, SSH Scripting, and More
Tuesday, October 22, 2024
Structural Pattern Matching in Python #652 – OCTOBER 22, 2024 VIEW IN BROWSER The PyCoder's Weekly Logo Structural Pattern Matching in Python In this tutorial, you'll learn how to harness the
Daily Coding Problem: Problem #1586 [Hard]
Tuesday, October 22, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Airbnb. An 8-puzzle is a game played on a 3 x 3 board of tiles, with the ninth tile
Mapped | The Home Price-to-Income Ratio of Large U.S. Cities 🏘️
Tuesday, October 22, 2024
The top five large US cities have a home price-to-income ratio more than double the national average of 4.7. View Online | Subscribe | Download Our App Presented by Hinrich Foundation NEW REPORT:
Ushering In
Tuesday, October 22, 2024
Netflix's Theatrical Strategy • Blade Runner vs. Elon Musk • Disney vs. App Store • Anthropic's AI PC Control • AirPods Hearing Boost Ushering In Netflix's Theatrical Strategy • Blade
Speeding up with SIMD and Go assembly
Tuesday, October 22, 2024
Plus some Go code generation magic, test parallelism, and working with Excel spreadsheets. | #528 — October 22, 2024 Unsub | Web Version Together with Ardan Labs Go Weekly A Taste of Go Code Generator
LW 155 - Optimizing Shopify Themes for Long Product Descriptions
Tuesday, October 22, 2024
Optimizing Shopify Themes for Long Product Descriptions Shopify Development news and articles
Secure Your Election 2024 eBook at the Best Value Today ⏰
Tuesday, October 22, 2024
Stay informed with our visual guide to the US Presidential Election—exclusively for VC+ members, along with additional updates. View email in browser Now Available: The Visual Guide to the US Election
Startups of The Year: How To Vote
Tuesday, October 22, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, October 22, 2024? The HackerNoon