Claude 3, the new best LLM on the block - Weekly News Roundup - Issue #457
Claude 3, the new best LLM on the block - Weekly News Roundup - Issue #457Plus: OpenAI reveals Elon's emails; Unitree's humanoid robot is available for purchase; Microsoft’s engineer raises concerns about Copilot Designer and responsible AI; and more!Welcome to Weekly News Roundup Issue #457. This week, Anthropic released Claude 3, probably the best large language model available today. In other news, OpenAI responded to Elon Musk's lawsuit by publishing emails between Musk and the OpenAI founders. A Microsoft engineer has raised concerns about Copilot Designer and responsible AI. Unitree's humanoid robot is now available for purchase, and more! Since its release almost a year ago, in mid-March 2023, GPT-4 was the top large language model available. Every other model released since then has been compared to GPT-4. There was a gap in capabilities between OpenAI’s top model and its competitors. However, as 2023 progressed, other companies began to close that gap, inching closer to the mark set by GPT-4, with some models even surpassing GPT-4 in certain benchmarks. Recently, the competition has caught up with OpenAI. First, there was Google DeepMind and its Gemini models. This week, Anthropic released its latest model, Claude 3, which claims to be the most intelligent large language model currently available and sets new industry benchmarks across a wide range of cognitive tasks. Let's investigate these claims and evaluate how intelligent Claude 3 really is. Similarly to Google Gemini, Claude 3 is not a single model but a family of three models, which are named, in ascending order of capability, Opus, Sonnet and Haiku. Anthropic says that Claude 3 Opus set “a new standard for intelligence” and that it shows us “the outer limits of what’s possible with generative AI”. The company backs these claims with the benchmark results comparing all Claude 3 models with their main competitors - GPT-4, GPT-3.5, Gemini 1.0 Ultra and Gemini 1.0 Pro. The results are quite impressive. However, what’s missing from this table are results from GPT-4 Turbo or Gemini 1.5 Pro. The reason for missing benchmarks from GPT-4 Turbo is that they have not been made publicly available. However, someone on Twitter shared unofficial GPT-4 Turbo benchmarks and they show that GPT-4 Turbo scores higher than Claude 3 Opus. The comparison to Gemini 1.5 Pro is provided in a paper describing the model and benchmark results in more detail and Claude 3 Opus emerges as a better model here. Those are results provided by Anthropic, so let’s take them with a pinch of salt as the company wants to present itself and Claude 3 in the best light possible to attract as many customers as possible. Over the next days and weeks, independent tests will be available and we will have a better picture of how Claude 3 compares in real-life applications to its competitors. One such comparison was made by AI Explained, who compared Claude 3 Opus with GPT-4 Turbo and Gemini 1.5 Pro across multiple queries and tested how good Claude 3 is at understanding images and reasoning. AI Explained concludes that Claude 3 Opus is “probably the most intelligent model currently available”. It is better at understanding images than other models and it is better at reasoning and answering even tricky questions. Claude 3 Opus also has lower rates of false refusal. AI Explained tested that by asking how to make a party “go down like a bomb”. Only Claude 3 understood the true meaning of the question and provided an answer. Each of the Claude 3 models has a context window of 200K tokens. However, all three models are capable of accepting inputs exceeding 1 million tokens. The larger context window is currently limited to selected customers who need enhanced processing power. Anthropic also showed that Claude 3 has an excellent recall accuracy across a 200K token context window, meaning it can recall any information no matter where in the input text it was. While we are discussing the Claude 3 context window, one of Anthropic’s employees shared an interesting story about internal testing on Claude 3 Opus in which the model suspected it was being evaluated. Some people take stories like these as proof that Claude 3 is sentient but the same behaviour can be explained by the way large language models work, how Claude 3 was trained and how it was conditioned to behave, as Yannic Kilcher explains in this video. Opus and Sonnet are available through Anthropic’s API, and Haiku will be available soon. Sonnet can be accessed for free on claude.ai, while Opus is available for Claude Pro subscribers. Sonnet is also available through Amazon Bedrock and in private preview on Google Cloud's Vertex AI Model Garden—with Opus and Haiku coming soon to both platforms. Overall, Claude 3 Opus is a very good model, possibly the best large language model available today. It shapes to be a good alternative to what OpenAI and Google have to offer and justifies Anthropic’s $18 billion valuation. It will be interesting to see where the AI industry goes from here. Anthropic does not believe Claude 3 is “anywhere near its limits” and promises to release frequent updates in the coming months. Meanwhile, other companies won’t stay still. Google might release Gemini 1.5 Ultra and jump over Claude 3. Meta is currently training Llama 3. And if rumours are true, OpenAI is already training what could be GPT-5, projected to be released in the second half of the year. In the meantime, OpenAI could release partially trained GPT-5 as GPT-4.5 to challenge other models and regain the top spot. The next few months are going to be interesting in the AI space. If you enjoy this post, please click the ❤️ button or share it. Do you like my work? Consider becoming a paying subscriber to support it For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter. Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving. 🦾 More than a human3D-printed skin closes wounds and contains hair follicle precursors 🧠 Artificial IntelligenceOpenAI and Elon Musk Microsoft’s engineer raises concerns about Copilot Designer and responsible AI Inflection-2.5: meet the world's best personal AI Public trust in AI is sinking across the board Researchers jailbreak AI chatbots with ASCII art - ArtPrompt bypasses safety measures to unlock malicious queries If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word? 🤖 Robotics▶️ Unitree H1 Breaking humanoid robot speed world record (1:15) After Agility Robotics’ Digit started being tested at Amazon and Figure announced a massive Series B funding round last week, it is now Unitree’s time to show what they bring to the growing humanoid robotics scene. Meet Unitree H1, a general-purpose humanoid robot from China that can run with a speed up to 3.3 m/s, jump, walk up and down stairs and, of course, dance. Additionally, the robot is now available for purchase for $150,000 with deliveries starting in Q1 2024. Watch an autonomous helicopter demo wildfire response skills Anyware Robotics’ Pixmo Takes Unique Approach to Trailer Unloading Robotics company Anyware Robotics joins the trailer unloading market with their Pixmo robot. Pixmo uses an off-the-shelf Fanuc robotic arm equipped with suction cups to lift boxes, which are then placed on a unique, built-in conveyor belt. This conveyor belt, which only Pixmo has, transports the boxes out of the trailer, significantly accelerating the unloading process. It can achieve a throughput of up to 1,000 boxes per hour, or approximately one box every four seconds. 🧬 BiotechnologyCultivated Biosciences poised to take its plant-based cream to market in 2025 This Swedish startup wants to reduce the cost, and controversy, around stem cells production Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it. Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human. A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support! My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!" |
Older messages
Your surgeon, a robot, will see you soon
Wednesday, March 6, 2024
How the robotic revolution promises to make surgeons more efficient and help patients recover more quickly from surgeries ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
CYBATHLON - The Olympics for Cyborgs - Weekly News Roundup - Issue #453
Monday, March 4, 2024
Plus: scammers steal $25 million with deepfakes; Bard becomes Gemini and Gemini Ultra is out; playing DOOM on cells; world's first transgenic ants; Atlas does something useful; and more!
Sam Altman asks for $7 trillion - Weekly News Roundup - Issue #454
Monday, March 4, 2024
Plus: OpenAI Sora and AI agents; ChatGPT gets memory; Gemini 1.5; "meaty" rice; more humanoid robots; glowing plants go on pre-order; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple is about to join the generative AI game - Weekly News Roundup - Issue #455
Monday, March 4, 2024
Plus: Nvidia reports record revenue; Google Gemma; Neuralink implant patient can move computer mouse by thinking, Musk says; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Figure becomes a $2.6B humanoid robotics startup - Weekly News Roundup - Issue #456
Monday, March 4, 2024
Plus: Elon Musk sues OpenAI; Google's AI makes games from text; Mistral Large; Xiaomi CyberDog 2; the first AI virus is here; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Charted | How People Get Around in America, Europe, and Asia 🚶♂️
Saturday, May 11, 2024
Examining how people get around on a daily basis using cars, public transit, and walking or biking, and the regional differences in usage. View Online | Subscribe Presented by Voronoi: The App Where
⚙️ How AI is Revolutionizing Sales Coaching
Saturday, May 11, 2024
Selling with the help of AI
Weekend Reading — Stack over you
Saturday, May 11, 2024
This week we remove gatekeepers from our CI, break a Guinness world record, redesign review ratings, understand the meaning behind “job requirements”, and level up. 😎 Labnotes (by Assaf Arkin) Weekend
Why Apple’s ‘Crush’ ad is so misguided
Saturday, May 11, 2024
Plus: How AI deepfakes took over the Met Gala View this email online in your browser By Cody Corrall Saturday, May 11, 2024 Image Credits: Apple Welcome to Week in Review: TechCrunch's newsletter
🐍 New Python tutorials on Real Python
Saturday, May 11, 2024
Hey there, There's always something going on over at realpython.com as far as Python tutorials go. Here's what you may have missed this past week: Python News: What's New From April 2024 In
CensysGPT: AI-Powered Threat Hunting Tool for Cybersecurity Pros
Saturday, May 11, 2024
THN Daily Updates Newsletter cover Enterprise Transformation to AI and the Metaverse ($59.99 Value) FREE for a Limited Time Strategies for the Technology Revolution Download Now Sponsored LATEST NEWS
📧 Building Resilient Cloud Applications With .NET
Saturday, May 11, 2024
Building Resilient Cloud Applications With .NET Read on: my website / Read time: 7 minutes BROUGHT TO YOU BY Build API Applications Visually Build API applications visually using Postman Flows.
The worst of the VC fund performance may be behind us
Friday, May 10, 2024
Plus: Zeekr's shares pop and Dell's data breach did include personal data View this email online in your browser By Christine Hall Friday, May 10, 2024 Good afternoon, and welcome to TechCrunch
DeepMind releases AlphaFold 3 - Weekly News Roundup - Issue #466
Friday, May 10, 2024
Plus: OpenAI releases Model Spec; Neuralink publishes progress update; Tesla shares new video of Optimus; growing meat with Gatorade; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Issue #540: Jam winners, AGILE, and game of life in 3D
Friday, May 10, 2024
Weekly newsletter about HTML5 Game Development. Is this email not displaying correctly? View it in your browser. Issue #540 - May 10th 2024 If you have anything you want to share with the HTML5 game