͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Anthropic raises the industry bar for intelligence - Weekly News Roundup - Issue #472

Plus: Ilya Sutskever is back; Nvidia becomes the world's most valuable company; another company trials a humanoid robot; a military robot-dog arms race; a mad scientist grows rat neurons to play Doom

Conrad Gray

Jun 21

READ IN APP

Hello and welcome to Weekly News Roundup Issue #472. This week, Anthropic released their latest model, Claude 3.5 Sonnet, which makes a solid attempt for the top spot in the leaderboards.

In other news, Ilya Sutskever is back with a new AGI lab. Meanwhile, Nvidia has become the most valuable company in the world. Over in robotics, another company is trialling a humanoid robot, and a military robot-dog arms race between China and the US has begun. We will finish with a deep dive into how AlphaFold 3 works and a story about a mad scientist growing rat neurons to play Doom.

I hope you enjoy this week’s issue!

Anthropic has released a new model in their Claude family of models, Claude 3.5 Sonnet, which is also the first Claude 3.5 model released to the public. According to Anthropic, its newest model raises the industry bar for intelligence and makes a solid attempt for the top spot in the leaderboards.

If the benchmark results provided by Anthropic are to be believed, Claude 3.5 Sonnet is a massive improvement over Anthropic’s previous flagship model, Claude 3 Opus, and outperforms competitors such as OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro. Additionally, Claude 3.5 Sonnet follows the recent trend of making AI models not only smarter but also faster, being two times faster than Claude 3 Opus.

Claude 3.5 Sonnet benchmarks — Source: Anthropic

Anthropic has also revealed that in their internal agentic coding evaluations, Claude 3.5 Sonnet solved 64% of problems, compared to their previous flagship model, Claude 3 Opus, which solved only 38% of problems in the same test. Additionally, Anthropic reports that Claude 3.5 Sonnet is better at fixing bugs or adding new functionality to open-source projects, given a natural language description of what needs to be done.

When equipped with relevant tools, Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities, says Anthropic. Claude 3.5 Sonnet is also good at translating code, which could make it useful in maintaining or migrating legacy codebases.

Anthropic has not described in detail what the coding test looked like and only provided results in the Claude 3.5 Sonnet Model Card Addendum.

Claude 3.5 Sonnet is also better at understanding and working with visual inputs. Anthropic claims its latest model outperforms its competitors in almost all the tests it conducted, even if by a slight margin.

At the moment, Claude 3.5 Sonnet can take in only images or PDFs and cannot analyse videos.

Claude 3.5 Sonnet vision evals — Vision benchmark results. Source: Anthropic

As with every first-party benchmark result, I recommend treating them with a grain of salt and seeing them as marketing material. For a better picture of how Claude 3.5 Sonnet compares to other models, I recommend checking out independent benchmarks and leaderboards, such as LMSYS Chatbot Arena or HELM (at the time I am writing this, Claude 3.5 Sonnet is not listed on those leaderboards yet). However, if Anthropic's claims are true, then Claude 3.5 Sonnet should jump close to the top of those rankings very soon.

Together with Claude 3.5 Sonnet, Anthropic is also releasing Artifacts. Artifacts is a UI improvement that makes working with Claude easier. For tasks such as code generation, working with text documents, or data analysis, Claude will open a preview window next to the chat where it will display its output. This feature makes working with Claude much easier compared to ChatGPT and other chatbots. For tasks like code generation, where you have to go back and forth and guide the chatbot to generate what you want, seeing a preview next to the chat saves a lot of scrolling up and down, and makes the overall experience much more pleasant.

The only issue I have with Artifacts is that after coming back to the chat from another conversation, the preview window is not open by default. It can be reopened through the Chat controls menu but I expected it to be there when I came back to the conversation.

Claude 3.5 Sonnet is available for free on Claude.ai and the Claude iOS app. Claude Pro and Team plan subscribers get higher rate limits, allowing them to send 5 times more messages to Claude 3.5 Sonnet compared to those on the Free plan. It is also available via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.

The access to Claude 3.5 Sonnet with Anthropic API is priced competitively. Anthropic’s newest model costs $3 per million input tokens and $15 per million output tokens—much cheaper compared to the pricing for Claude 3 Opus, which is $15 per million input tokens and $75 per million output tokens (making Claude 3.5 Sonnet 3x and 5x cheaper, respectively).

For comparison, OpenAI charges $5 per million input tokens and $15 per million output tokens to use GPT-4o through the OpenAI API. Meanwhile, Google asks $3.50 per million input tokens and $10.50 per million output tokens to use Gemini 1.5 Pro.

Claude 3.5 Sonnet has a 200k token context window. GPT-4o’s context window is 128k tokens long, while Gemini 1.5 Pro offers a massive 1 million token context window, soon to be expanded to 2 million tokens.

With Claude 3.5 Sonnet, Anthropic presents a new level of performance for mid-tier models. If the benchmark results published by Anthropic are to be believed, Claude 3.5 Sonnet is at the same level, if not better, than GPT-4o and Google Gemini 1.5 Pro. And let’s not forget that Sonnet is the mid-tier model in the Claude family. Anthropic promises to release the remaining models in the Claude 3.5 family—the light and fast Haiku, and the most powerful Opus—later this year.

I will keep my eyes on Claude 3.5 Opus, as it has the potential to raise the bar for other models quite significantly, judging by how much of an improvement Claude 3.5 Sonnet is.

If you enjoy this post, please click the ❤️ button or share it.

Share

Do you like my work? Consider becoming a paying subscriber to support it

Become a paid subscriber

For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter.

Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving.

🦾 More than a human

New technique brings frozen brain tissue back to life without harm
Scientists in China have discovered a new technique that can revive frozen human brains without damaging them. Using a novel chemical concoction, they have successfully frozen and thawed human brain tissue to find it functioned normally. Researchers hope their method will help develop reliable cryopreservation technology. The same technique could also be useful in cryonics—the practice of cryogenically preserving human bodies or brains with the hope they can be revived in the future.

This AI-Powered Exoskeleton Could Speed Adoption by the Masses
Researchers at North Carolina State University have developed an AI-powered exoskeleton that can seamlessly adapt to any user without extensive training. They achieved this by training the AI model controlling the exoskeleton entirely in simulation by combining musculoskeletal models and reinforcement learning. Tests showed significant energy savings for users, and the method holds promise for broader applications, including aiding older adults and people with neurological conditions.

🧠 Artificial Intelligence

Ilya Sutskever Has a New Plan for Safe Superintelligence
Ilya Sutskever, former Chief Scientist and co-founder of OpenAI, is back and bringing a new AI lab to the game, named Safe Superintelligence Inc., which has only one goal and one product: a safe superintelligence. Alongside Sutskever, the company was founded by Daniel Gross, an investor and former AI lead at Apple, and Daniel Levy, who worked with Sutskever at OpenAI. The company is currently looking for people to join their “lean, crack team of the world’s best engineers and researchers dedicated to focusing on SSI and nothing else.” Sutskever did not disclose who is backing Safe Superintelligence or how much he has raised.

Nvidia becomes world’s most valuable company amid AI boom
Nvidia did it. After over a year of nothing but growth and growth, and riding the generative AI wave high, Nvidia has become the most valuable company in the world, worth $3.34 trillion.

New AI Project Aims to Mimic the Human Neocortex
The Thousand Brains Project is a new, ambitious, and open-source initiative aimed at developing a new AI platform by reverse engineering the neocortex and creating a new kind of AI model closely mimicking how the human brain works. The project is funded by the Gates Foundation, which will invest a minimum of $2.69 million over two years. The Thousand Brains Project hopes to partner with electronics companies, government agencies, and university researchers to explore potential applications for its new platform.

Google DeepMind Shifts From Research Lab to AI Product Factory
DeepMind has been going through a transformation since the generative AI boom started, focusing more on commercial applications of AI and shipping products like Gemini, instead of working on research projects such as AlphaFold, according to this article. The article also reveals tension and frustration inside the lab, with some employees saying that pure research is being given short shrift and how the way they work has changed since DeepMind and Google Brain were combined to create Google DeepMind.

▶️ OpenAI Stole Scarlet Johansson's Voice (20:51)

Legal Eagle examines the recent controversy around OpenAI copying Scarlett Johansson’s voice (I wrote about it in detail here) from a legal point of view. The analysis explains how US intellectual property and copyright laws, as well as California’s right to publicity and previous legal cases involving various companies copying celebrities’ voices, might be useful for Johansson if she pursues legal action against OpenAI.

An AI Bot Is (Sort of) Running for Mayor in Wyoming
Victor Miller, running for mayor of Cheyenne, Wyoming, has proposed an unusual campaign promise: delegating decision-making to an AI bot named VIC (Virtual Integrated Citizen). Built on OpenAI’s ChatGPT, VIC would analyze documents and make policy recommendations, with Miller acting as the bot's human representative. However, this approach raises legal questions, as AI bots cannot run for office, and candidates must be real people. Additionally, VIC is apparently violating OpenAI’s policies and may be shut down by OpenAI.

McDonald’s Terminates Its Drive-Through Ordering AI Assistant
After a two-year trial run, McDonald’s has decided to stop using its AI voice assistant to take drive-through orders. The system was installed at over 100 drive-throughs, but it seems it did not meet expectations.

If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word?

Refer a friend

🤖 Robotics

Let Slip the Robot Dogs of War
A new arms race between China and the US has begun. Recently, Chinese soldiers trained with a robot dog that had a gun attached to its back, yet another example of the Chinese army experimenting with using robots on the battlefield. The US military is also testing how to effectively use four-legged robots for both combat and non-combat roles, such as patrolling bases. Although these robots lack the agility and speed necessary for chaotic battlefield conditions, their presence raises alarms and concerns about the ethics of using autonomous weapon systems.

Apollo humanoid robot in tests by Apptronik and GXO for warehouse use
Another humanoid robot scores a test program in a real workplace. Apptronik’s Apollo will begin trials in GXO warehouses. GXO had already been testing Agility Robotics’ Digit, which is also being trialled at Amazon. This marks the second trial customer for Apollo, which previously secured a trial program with Mercedes. Other humanoid robots have also begun tests in real workplaces to see if the hype meets expectations.

China: World’s first 3D e-skin gives robots human-like touching sense
Chinese scientists have developed a 3D electronic skin that can be applied like a band-aid to monitor real-time health data. This innovation has applications in biomedical diagnosis as well as robotics. It could be used in medical robots for diagnostics or applied directly to human skin for continuous health monitoring.

🧬 Biotechnology

▶️ Growing Living Neurons to Play...Doom? (28:12)

One of my favourite crazy scientists on YouTube, The Thought Emporium, continues his quest to grow rat neurons and teach them to play Doom. This video focuses on getting signals from the neurons and stimulating them.

AlphaFold 3 Angst: Limited Accessibility Stirs Outcry from Researchers
The release of AlphaFold 3 was met with excitement but also with disappointment. Unlike AlphaFold 2, AlphaFold 3 was not made open-source and is only available via AlphaFold Server, which restricts inputs and limits the number of requests per day. This article shares the outcry and concerns some scientists have about AlphaFold 3 and highlights ongoing challenges in balancing innovation, corporate interests, and the scientific community's need for open and reproducible research.

An Opinionated AlphaFold3 Field Guide
Dimension Research has published a detailed explanation of how AlphaFold 3 works, which goes through every layer of the model and describes its functions. Additionally, the article also serves as a review of AlphaFold 3 based on the information we know so far about the model. I recommend checking this article out if you want to have a good understanding of how AlphaFold 3 works.

Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it.

Share

Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human.

A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support!

My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!"

Anthropic raises the industry bar for intelligence - Weekly News Roundup - Issue #472

Anthropic raises the industry bar for intelligence - Weekly News Roundup - Issue #472

Plus: Ilya Sutskever is back; Nvidia becomes the world's most valuable company; another company trials a humanoid robot; a military robot-dog arms race; a mad scientist grows rat neurons to play Doom

🦾 More than a human

🧠 Artificial Intelligence

🤖 Robotics

🧬 Biotechnology

Older messages

What is "humanity" anyway?

Apple Intelligence is different

Weekly News Roundup - Issue #471

Nine years

Will humanoid robots take off?

You Might Also Like

JSter #238 - Libraries and more

Master the New Elasticsearch Engineer v8.x Enhancements!

Daily Coding Problem: Problem #1707 [Medium]

Simplification Takes Courage & Perplexity introduces Comet

Mapped | Which Countries Are Perceived as the Most Corrupt? 🌎

The new tablet to beat

Import AI 402: Why NVIDIA beats AMD: vending machines vs superintelligence; harder BIG-Bench

GCP Newsletter #440

Apple Should Swap Out Siri with ChatGPT

⚡ THN Weekly Recap: Alerts on Zero-Day Exploits, AI Breaches, and Crypto Heists