͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

I hope you enjoy this free post. If you do, please like ❤️ or share it, for example by forwarding this email to a friend or colleague.

Writing this post took around eight hours to write. Liking or sharing it takes less than eight seconds and makes a huge difference. Thank you!

Share

Claude 3.7 Sonnet and GPT-4.5 - Sync #508

Plus: Alexa+; Google AI co-scientist; humanoid robots for home from Figure and 1X; miracle HIV medicine; a startup making glowing rabbits; and more!

Conrad Gray

Mar 2

READ IN APP

Hello and welcome to Sync #508!

This week, both Anthropic and OpenAI released their newest models, Claude 3.7 Sonnet and GPT-4.5, respectively, and we’ll take a closer look at what both companies have brought to the table.

Elsewhere in AI, Amazon announced a new upgraded Alexa powered by Anthropic’s Claude. Meanwhile, Google released an AI co-scientist, OpenAI rolled out Deep Research, Sora arrived in the UK and EU, and DeepSeek accelerated the timeline for releasing its next model, R2.

In robotics, Figure and 1X announced their humanoid robots for home. Additionally, researchers from the RAI Institute taught Spot to run faster and built a bicycle-riding robot capable of doing some impressive tricks.

Beyond that, this issue of Sync also features a paper from Meta on decoding brainwaves into text, a miracle HIV medicine, a startup promising to deliver glow-in-the-dark rabbits and other fantastical animals as pets, and more!

Enjoy!

Claude 3.7 Sonnet and GPT-4.5

This week, we have seen the release of not one but two new models from leading AI labs, OpenAI and Anthropic. In this article, we will take a closer look at both models and what they tell us about the future trajectories of AI development.

Claude 3.7 Sonnet—Anthropic’s first hybrid reasoning model

Let’s start with Claude 3.7 Sonnet—Anthropic’s first hybrid reasoning model. According to Anthropic, this means their newest model is both a large language model and a reasoning model in one.

When I first heard rumours about this hybrid approach, I thought Anthropic’s new model would be able to dynamically switch between a fast, LLM-based mode and a slower but more powerful reasoning mode based on the prompt, thus removing the need for the user to decide which mode to use. However, that did not happen, and we still have to choose which mode we want to use (although Claude’s UI is a bit cleaner in that regard compared to ChatGPT’s).

Similar to what we have seen in other reasoning models, in Extended Thinking mode Claude takes some time to “think” about the prompt and self-reflects before answering it, thus making it perform better on math, physics, instruction-following, coding, and many other tasks. To support these claims, Anthropic released a number of benchmark results showing an uplift in performance with Extended Thinking compared to Claude 3.5 Sonnet. When compared to its competitors, Anthropic’s new model with Extended Thinking is usually close to other reasoning models.

Additionally, Anthropic released benchmark results for coding and agentic tasks, both presenting Claude 3.7 Sonnet as a better option compared to Claude 3.5 Sonnet, OpenAI o1 and o3-mini (high), and DeepSeek R1.

Anthropic also got creative in showing how good Claude 3.7 Sonnet is by making it play the Game Boy classic Pokémon Red. Claude 3.7 Sonnet successfully completed the game, whereas previous versions of Claude struggled with it.

Please bear in mind that these are benchmark results provided by Anthropic and should be treated as marketing and taken with a grain of salt (this also applies to other companies).

Claude 3.7 Sonnet is now available on all Claude plans—including Free, Pro, Team, and Enterprise—as well as the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. However, the Extended Thinking mode is only available on paid plans. The API cost for Claude 3.7 Sonnet remains the same as its predecessors—$3 per million input tokens and $15 per million output tokens.

One interesting thing Anthropic showed us this week is Claude Code—Anthropic’s first agentic coding tool. Claude Code is designed to assist developers with tasks such as searching and reading code, editing files, running tests, and managing GitHub commits. According to Anthropic, Claude Code has already become indispensable for its engineering teams, who use it to streamline development by automating complex processes such as debugging and large-scale refactoring. Also, you know Claude Code is geared towards software developers since it is a terminal tool.

Claude Code is currently in a limited research preview. If you want to try it, you can download Claude Code from GitHub, though you will need an account with Anthropic.

GPT-4.5—meh?

OpenAI’s newest model, GPT-4.5, is an interesting release, to say the least. 10 days before the release, Sam Altman was hyping the upcoming GPT-4.5 model by saying on X that “trying GPT-4.5 has been much more of a "feel the AGI" moment among high-taste testers than i expected.“

Well, the reality turned out to be a bit different.

As Sam Altman said in a tweet announcing the new model, GPT-4.5 is a giant and expensive model. OpenAI won’t disclose any information about the first claim (although the company confirmed it is its largest model to date) but the fact that GPT-4.5 is the most expensive OpenAI model in their API catalogue supports the second claim. Compared to GPT-4o, the new model is 15 to 30 times more expensive. It is even more expensive to use GPT-4.5 via API than OpenAI’s reasoning models o1 or o3-mini.

One might expect an equal uplift in performance to justify the steep price increase. However, if you look into the benchmark results provided by OpenAI, you won’t find 15 to 30 times more performance compared to previous models.

In some benchmarks, GPT-4.5 outperforms GPT-4o, with human testers preferring GPT-4.5’s answers in everyday and professional queries, as well as in queries requiring creativity.

However, a look into the GPT-4.5 System Card reveals benchmarks—such as OpenAI’s Research Engineer interview—where the new model does exceed its predecessor (though not by much) but falls behind o1 or o3-mini. In some cases, both GPT-4.5 and the o-series models are significantly outperformed by Deep Research.

GPT-4.5 is also supposed to hallucinate less than other OpenAI models. The graph showing the reduction in hallucinations included in the announcement shows a massive drop compared to GPT-4o or o3-mini but not so much compared to o1.

However, in the System Card, OpenAI shows that GPT-4.5 has scored lower rates in hallucination than GPT-4o but it is at the same level as o1 (although GPT-4.5 scores higher in accuracy on that benchmark).

Additionally, OpenAI claims GPT-4.5 is its “best model for chat yet” and has greater “EQ”, or emotional intelligence. AI Explained checked the last claim and found that GPT-4.5 failed in his EQ evaluations while Claude 3.7 Sonnet provided expected answers.

At the time I am writing this article, GPT-4.5 is only available to ChatGPT Pro subscribers. OpenAI promises to begin rolling out to Plus and Team users next week, then to Enterprise and Edu users the following week. GPT-4.5 currently supports internet search, and file and image uploads. However, GPT‑4.5 does not currently support multimodal features like Voice Mode, video, and screen-sharing in ChatGPT.

I might be harsh in criticising OpenAI, especially considering that Claude 3.7 Sonnet without Extended Thinking also did not a massive improvement over Claude 3.5 Sonnet. However, Anthropic did not raise the cost of accessing its latest model through the API by 15 to 30 times. Additionally, Claude 3.7 Sonnet is available to all paying subscribers, not just those who pay $200 per month. GPT-4.5 comes with a significantly higher price tag than GPT-4o or even the o-series of reasoning models, yet it does not justify this cost with a proportional increase in performance.

There is also one more thing worth mentioning—in the first published version of GPT-4.5 System Card, OpenAI stated that “GPT-4.5 is not a frontier model.” The latest version does not have that sentence anymore.

What Claude 3.7 Sonnet and GPT-4.5 tell us about future AI developments

What GPT-4.5 is showing is that we might have reached the point of diminishing returns when it comes to non-reasoning AI models. OpenAI has thrown a massive amount of computing power into creating its largest and most expensive model, and all that investment did not translate into an equally massive rise in performance.

Sam Altman outlined in the roadmap for GPT-4.5 and GPT-5 that GPT-4.5 is OpenAI’s last non-chain-of-thought model. As DeepSeek R1 showed us a couple of weeks ago and o3 before it, the next gains in performance will not come from training even larger language models but from relying more on test-time compute—creating AI models that take more time to “reason” before producing an answer.

That’s the path the leading AI companies believe will lead to the next breakthroughs and, eventually, to AGI.

If you enjoy this post, please click the ❤️ button or share it.

Share

Do you like my work? Consider becoming a paying subscriber to support it

Become a paid subscriber

For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter.

Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving.

🦾 More than a human

Brain-to-Text Decoding: A Non-invasive Approach via Typing
Brain2Qwerty is a deep learning model created by researchers at Meta that decodes brainwaves into text using non-invasive techniques to read brain activity. In experiments involving 35 healthy participants, researchers achieved a character error rate (CER) of 32% for magnetoencephalography (MEG) and 67% for electroencephalography (EEG). Researchers hope that Brain2Querty will narrow the gap between invasive and non-invasive methods and thus open the path for developing safe brain-computer interfaces. The paper describing Brain2Qwerty is available here.

First two-way adaptive brain-computer interface enhances communication efficiency
A team of bioengineers from China has developed what they describe as the world's first two-way adaptive brain-computer interface (BCI). Unlike other BCI devices, which only read brain activity, this new BCI allows for direct feedback to the brain, making it a true two-way communication device. The purpose of enabling two-way interaction, the team notes, is to improve efficiency and expand the range of potential applications. They claim their new device boosts efficiency 100-fold and reduces energy demand by approximately 1,000 times compared to conventional BCI devices.

Supporting the first steps in your longevity career
My friends over at LongX are launching the Xplore Program—a unique course and mentorship initiative designed for the next generation of longevity enthusiasts. This program serves as an ideal first step for early-career professionals looking to break into the field, offering both in-depth educational resources and hands-on opportunities to apply their knowledge. If you're passionate about longevity and eager to get started, apply here.

🧠 Artificial Intelligence

Introducing Alexa+, the next generation of Alexa
Amazon has announced Alexa+, the long-awaited next-generation upgrade to Alexa. According to Amazon’s press release, the new Alexa is designed to be more conversational, smarter, and personalised. Amazon claims Alexa+ can assist with smart home management, entertainment, shopping, scheduling, and personalised recommendations. It can control devices, play music, make reservations, track purchases, summarise documents, and even autonomously complete tasks like booking repairs—all while adapting to user preferences and maintaining privacy. Alexa+ will be available in early access in the US in the next month, with Echo Show 8, 10, 15, and 21 users receiving priority. Alexa+ will cost $19.99 per month but will be free for Amazon Prime members. The next-gen Alexa is powered by Claude, Anthropic’s flagship AI model.

Accelerating scientific breakthroughs with an AI co-scientist
Google has released AI co-scientist, a multi-agent AI system built on Gemini 2.0, designed as a collaborative tool for scientists. It aims to generate novel hypotheses and research proposals, accelerating scientific and biomedical discoveries. Google claims that the AI co-scientist outperformed other reasoning models in scientific problem-solving, demonstrating its ability to predict new therapeutic applications for existing drugs, identify novel epigenetic targets with anti-fibrotic activity, and rediscover key findings without prior knowledge. Currently, the AI co-scientist is not publicly available and is being provided to research organisations through a Trusted Tester Program.

▶️ there is nothing new here (30:31)

Angela Collier, a science communicator and YouTuber, responds to an article in New Scientist about the recently announced Google AI co-scientist. She criticises how AI advancements are often exaggerated in the media, misleading the public into believing AI can make groundbreaking discoveries when, in reality, it only reorganises existing information. Additionally, she warns about the monetisation strategies of AI companies, the ethical concerns of AI training on stolen data, and the risks of institutions becoming dependent on these tools.

OpenAI rolls out deep research to paying ChatGPT users
OpenAI has made Deep Research, its reasoning model based on o3 with the ability to browse the internet and create research reports, available to all paying ChatGPT users—except those in the EU. ChatGPT Plus, Team, Enterprise, and Edu subscribers will receive 10 deep research queries per month, while ChatGPT Pro subscribers will have access to 120 queries per month, an increase from 100 at launch. Alongside the release, OpenAI has also published the System Card for Deep Research. Similar to previous publications, the 35-page document dedicates half a page to describing the model, with the remainder focused on safety evaluations. According to OpenAI, Deep Research received a medium risk level on its Preparedness Scorecard, therefore it is acceptable to make it publically available.

DeepSeek rushes to launch new AI model as China goes all in
Reuters reports that after the massive success of its R1 model, DeepSeek is accelerating the release of its successor, R2, aiming to improve coding capabilities and expand beyond English reasoning. The company had initially planned to launch the new model around May but now aims to release it as soon as possible.

Apple will spend more than $500 billion in the U.S. over the next four years
Apple has announced plans to invest over $500 billion in the US over the next four years. The company aims to accelerate AI development, manufacturing, R&D, and job creation across the country. A key focus of the plan is new chip manufacturing facilities, including a new facility in Houston set to begin production of Apple Intelligence servers in 2026 and the move to manufacture Apple Silicon chips at TSMC’s Fab 21 facility in Arizona.

OpenAI’s Sora is now available in the EU, UK
Sora, OpenAI’s video generator, is now available to ChatGPT Plus and Pro subscribers in the European Union, the UK, Switzerland, Norway, Liechtenstein, and Iceland. Sora was launched in December last year but was initially unavailable in the UK and EU due to regulatory issues, which have now been resolved.

DeepSeek goes beyond “open weights” AI with plans for source code release
DeepSeek continues its commitment to open-source AI with its Open Source Week. Starting Thursday last week, the Chinese AI lab released a new open-source library or tool useful for AI development each day for five days. You can find them on DeepSeek’s GitHub page.

Rabbit shows off the AI agent it should have launched with
Unlike Humane (which was acquired by HP last week), Rabbit is still going. The company gained notoriety last year with its Rabbit R1 personal device, which failed to meet promises and expectations. This week, the company showcased a “generalist Android agent” that interacts with apps, similar to what the R1 device was originally supposed to do. In a demo, engineers interacted with the AI agent by typing prompts on a laptop, which then translated actions onto an Android tablet. Rabbit plans to reveal more about its “cross-platform multi-agent system” soon.

US AI Safety Institute could face big cuts
The US Artificial Intelligence Safety Institute (AISI) faces an uncertain future as the National Institute of Standards and Technology (NIST) prepares to lay off up to 500 employees. Reports indicate that some employees have already received verbal termination notices. AISI, established under President Biden’s executive order to study AI risks and set safety standards, has been destabilized after President Trump repealed the order on his first day back in office, and its director resigned in February. Experts warn that these cuts could severely undermine the government’s ability to research and regulate AI safety at a critical time.

The EU AI Act is Coming to America
Despite claims that the Trump Administration would not follow the EU’s approach to regulating AI, the reality may turn out differently, argues Dean W. Ball . In his post atHyperdimensional, he examines the broader picture of AI regulation in the US, focusing on actions taken by various states. He notes striking similarities between proposed state laws and the EU AI Act, particularly in regulating AI use in high-risk industries such as employment, education, financial services, healthcare, legal services, and government services. Dean warns that if these regulations take hold, AI could soon face stricter oversight than initially expected, with potential consequences for innovation and economic growth. He concludes that within a year or two, artificial intelligence will become the most heavily regulated general-purpose digital technology in American history.

▶️ Terence Tao - Machine-Assisted Proofs (59:11)

Terence Tao, considered one of the greatest mathematicians of all time, explores in this talk how AI and machine-assisted proofs can transform mathematics. He highlights how mathematicians have long relied on computational tools—from the abacus to logarithmic tables—and that the introduction of AI-powered tools is simply the next natural step. He recognises the transformative potential of these tools, such as enabling large-scale experimental mathematics, but does not yet see a definitive "killer app." Tao also emphasises key challenges, including AI's struggles with arithmetic, the inefficiency of formalisation, and the critical need for high-quality mathematical databases to fully unlock AI's potential in mathematics.

If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word?

Refer a friend

🤖 Robotics

1X NEO Gamma

1X, a Norwegian humanoid robotics startup, presents Gamma, a humanoid robot designed for household tasks. Unlike other such robots, Gamma features a soft, fabric skin for flexible and dynamic movements. The company envisions Gamma being used for cleaning and managing homes, as well as serving as a companion capable of having conversations, collaborating on various tasks, and even tutoring. 1X has not disclosed when the robot will be available for purchase or its price.

▶️ Helix Logistics (1:32)

In this video, Figure demonstrates how its humanoid robot, Figure 02, can be used in logistics to pick up packages from a conveyor belt. The video showcases the robot's ability to identify objects and its dexterity.

Figure will start ‘alpha testing’ its humanoid robot in the home in 2025
Brett Adcock, CEO of Figure, announced that the company will begin alpha testing its Figure 02 humanoid robot in homes later in 2025. According to Adcock, the company’s recently announced in-house Vision-Language-Action AI model for robotics, Helix, is advancing faster than anticipated, accelerating the timeline for trials in home settings.

Reinforcement Learning Triples Spot’s Running Speed

Using reinforcement learning, researchers from the RAI Institute increased the running speed of Spot, Boston Dynamics’ four-legged robot, from 1.6 m/s to 5.2 m/s, more than tripling its original factory speed. Moreover, the fast-running gait is not biologically inspired but optimised for the robot’s mechanics. Spot’s default control system (MPC) models and optimises movement in real time but has rigid limitations. Reinforcement learning, in contrast, trains offline with complex models in simulations, leading to more efficient and adaptive control policies. With this result, the RAI Institute has demonstrated that reinforcement learning is a generalisable tool that can expand robot capabilities beyond traditional algorithms, enabling robots to move more efficiently, quietly, and reliably in various environments.

▶️ Stunting with Reinforcement Learning (1:23)

Researchers from the RAI Institute present the Ultra Mobile Vehicle, a robotic bike that can not only ride but also jump and perform various tricks. This project showcases the potential of reinforcement learning, and the results are truly impressive.

Swarms of small robots could get big stuff done
Researchers have created small, durable, and 3D-printable robots that can be easily produced at scale and used in swarm applications for tasks such as disaster response, battlefield mine clearance, and agricultural monitoring. The robots are designed with both soft and rigid components, making them highly resilient. They can survive drops from helicopters, withstand crushing forces, and traverse various terrains, including rocks, sand, and steep inclines.

🧬 Biotechnology

Making a “Miracle” HIV Medicine
This article from Asimov Press tells the story of lenacapavir, a “miracle” HIV medicine providing 96–100% HIV prevention for six months. Unlike other drugs, which sometimes need to be taken daily, lenacapavir can be administered just twice a year. The drug has successfully completed two clinical trials in humans and is available at reduced costs in 120 low-income countries but is excluded in middle-income nations (e.g., Peru, Brazil, China, and Russia). If widely distributed, lenacapavir could help achieve the UN’s 2030 HIV eradication goal.

Your Next Pet Could Be a Glowing Rabbit
Los Angeles Project is a new startup founded by Josie Zayner, a well-known biohacker and founder of The Odin, a company that sells DIY genetic engineering kits, and Cathy Tie, a biotech entrepreneur. The startup aims to offer genetically engineered pets such as glow-in-the-dark rabbits, hypoallergenic cats, and potentially even more extreme creations in the future, like dragons or unicorns. It has been operating in stealth mode for a year, experimenting on frogs, fish, hamsters, and rabbits. They have successfully used CRISPR to insert the GFP (green fluorescent protein) gene into rabbit embryos. Now, the startup plans to implant these embryos in female rabbits, which could result in glowing bunnies within a month. The company aims to sell gene-edited pets to consumers, positioning itself in an underdeveloped pet market.

💡Tangents

This artist collaborates with AI and robots
Meet Sougwen Chung, a non-binary Canadian-Chinese artist who combines art with technology and views AI as an opportunity rather than a threat to creativity. Chung’s exhibitions feature live performances where they create art alongside or in collaboration with robots. Their work explores the interaction between human creativity and machine input in real-time. While concerns exist about AI replacing artists, Chung believes AI can expand artistic possibilities rather than diminish them. Their work demonstrates how AI can be integrated into art in ways that foster unpredictability, spontaneity, and new forms of expression.

Y Combinator Takes Heat for Helping Launch a Startup That Spies on Factory Workers
Y Combinator received backlash for promoting Optifye.ai, a startup that some described as "sweatshop-as-a-service." In a now-deleted video, the startup showcased how its workplace surveillance tool tracks the performance of individual workers. In the video, the founders role-played a scenario in which a garment factory boss notices a drop in a worker’s performance (flagged by their software) and calls a supervisor to address the issue. While some saw this as satire, the founders of Optifye.ai were serious. As TechCrunch noted, the video's virality may have unearthed growing anxiety over the rise of AI-powered workplace surveillance systems.

Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it.

Share

Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human.

A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support!

My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!"

Claude 3.7 Sonnet and GPT-4.5 - Sync #508

Claude 3.7 Sonnet and GPT-4.5 - Sync #508

Plus: Alexa+; Google AI co-scientist; humanoid robots for home from Figure and 1X; miracle HIV medicine; a startup making glowing rabbits; and more!

Claude 3.7 Sonnet and GPT-4.5

Claude 3.7 Sonnet—Anthropic’s first hybrid reasoning model

GPT-4.5—meh?

What Claude 3.7 Sonnet and GPT-4.5 tell us about future AI developments

🦾 More than a human

🧠 Artificial Intelligence

🤖 Robotics

🧬 Biotechnology

💡Tangents

Older messages

AI that can model and design the genetic code for all domains of life - Sync #507

CES 2025 - Sync #501

500 weeks later

OpenAI proposes a new corporate structure - Sync #500

Deliberative alignment - Sync #499

You Might Also Like

Vo1d Botnet's Peak Surpasses 1.59M Infected Android TVs, Spanning 226 Countries

🪩 Why There Are So Many Linux Distros — Do Clone's Creepy Robots Have a Reason to Exist?

Re: Take incredible iPhone photos

So you want to break down monolith? Read that first.

📧 Get Pragmatic REST APIs for 30% OFF (limited offer)

SRE Weekly Issue #466

WP Weekly 232 - Energy - Faster Woo, Patterns in Folders, $800K Yearly

Last Chance to Register for ElasticON Singapore – Don’t Miss Out!

Spring Bean Scopes for Dependency Injection