Claude 3, the new best LLM on the block - Weekly News Roundup - Issue #457
Claude 3, the new best LLM on the block - Weekly News Roundup - Issue #457Plus: OpenAI reveals Elon's emails; Unitree's humanoid robot is available for purchase; Microsoft’s engineer raises concerns about Copilot Designer and responsible AI; and more!Welcome to Weekly News Roundup Issue #457. This week, Anthropic released Claude 3, probably the best large language model available today. In other news, OpenAI responded to Elon Musk's lawsuit by publishing emails between Musk and the OpenAI founders. A Microsoft engineer has raised concerns about Copilot Designer and responsible AI. Unitree's humanoid robot is now available for purchase, and more! Since its release almost a year ago, in mid-March 2023, GPT-4 was the top large language model available. Every other model released since then has been compared to GPT-4. There was a gap in capabilities between OpenAI’s top model and its competitors. However, as 2023 progressed, other companies began to close that gap, inching closer to the mark set by GPT-4, with some models even surpassing GPT-4 in certain benchmarks. Recently, the competition has caught up with OpenAI. First, there was Google DeepMind and its Gemini models. This week, Anthropic released its latest model, Claude 3, which claims to be the most intelligent large language model currently available and sets new industry benchmarks across a wide range of cognitive tasks. Let's investigate these claims and evaluate how intelligent Claude 3 really is. Similarly to Google Gemini, Claude 3 is not a single model but a family of three models, which are named, in ascending order of capability, Opus, Sonnet and Haiku. Anthropic says that Claude 3 Opus set “a new standard for intelligence” and that it shows us “the outer limits of what’s possible with generative AI”. The company backs these claims with the benchmark results comparing all Claude 3 models with their main competitors - GPT-4, GPT-3.5, Gemini 1.0 Ultra and Gemini 1.0 Pro. The results are quite impressive. However, what’s missing from this table are results from GPT-4 Turbo or Gemini 1.5 Pro. The reason for missing benchmarks from GPT-4 Turbo is that they have not been made publicly available. However, someone on Twitter shared unofficial GPT-4 Turbo benchmarks and they show that GPT-4 Turbo scores higher than Claude 3 Opus. The comparison to Gemini 1.5 Pro is provided in a paper describing the model and benchmark results in more detail and Claude 3 Opus emerges as a better model here. Those are results provided by Anthropic, so let’s take them with a pinch of salt as the company wants to present itself and Claude 3 in the best light possible to attract as many customers as possible. Over the next days and weeks, independent tests will be available and we will have a better picture of how Claude 3 compares in real-life applications to its competitors. One such comparison was made by AI Explained, who compared Claude 3 Opus with GPT-4 Turbo and Gemini 1.5 Pro across multiple queries and tested how good Claude 3 is at understanding images and reasoning. AI Explained concludes that Claude 3 Opus is “probably the most intelligent model currently available”. It is better at understanding images than other models and it is better at reasoning and answering even tricky questions. Claude 3 Opus also has lower rates of false refusal. AI Explained tested that by asking how to make a party “go down like a bomb”. Only Claude 3 understood the true meaning of the question and provided an answer. Each of the Claude 3 models has a context window of 200K tokens. However, all three models are capable of accepting inputs exceeding 1 million tokens. The larger context window is currently limited to selected customers who need enhanced processing power. Anthropic also showed that Claude 3 has an excellent recall accuracy across a 200K token context window, meaning it can recall any information no matter where in the input text it was. While we are discussing the Claude 3 context window, one of Anthropic’s employees shared an interesting story about internal testing on Claude 3 Opus in which the model suspected it was being evaluated. Some people take stories like these as proof that Claude 3 is sentient but the same behaviour can be explained by the way large language models work, how Claude 3 was trained and how it was conditioned to behave, as Yannic Kilcher explains in this video. Opus and Sonnet are available through Anthropic’s API, and Haiku will be available soon. Sonnet can be accessed for free on claude.ai, while Opus is available for Claude Pro subscribers. Sonnet is also available through Amazon Bedrock and in private preview on Google Cloud's Vertex AI Model Garden—with Opus and Haiku coming soon to both platforms. Overall, Claude 3 Opus is a very good model, possibly the best large language model available today. It shapes to be a good alternative to what OpenAI and Google have to offer and justifies Anthropic’s $18 billion valuation. It will be interesting to see where the AI industry goes from here. Anthropic does not believe Claude 3 is “anywhere near its limits” and promises to release frequent updates in the coming months. Meanwhile, other companies won’t stay still. Google might release Gemini 1.5 Ultra and jump over Claude 3. Meta is currently training Llama 3. And if rumours are true, OpenAI is already training what could be GPT-5, projected to be released in the second half of the year. In the meantime, OpenAI could release partially trained GPT-5 as GPT-4.5 to challenge other models and regain the top spot. The next few months are going to be interesting in the AI space. If you enjoy this post, please click the ❤️ button or share it. Do you like my work? Consider becoming a paying subscriber to support it For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter. Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving. 🦾 More than a human3D-printed skin closes wounds and contains hair follicle precursors 🧠 Artificial IntelligenceOpenAI and Elon Musk Microsoft’s engineer raises concerns about Copilot Designer and responsible AI Inflection-2.5: meet the world's best personal AI Public trust in AI is sinking across the board Researchers jailbreak AI chatbots with ASCII art - ArtPrompt bypasses safety measures to unlock malicious queries If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word? 🤖 Robotics▶️ Unitree H1 Breaking humanoid robot speed world record (1:15) After Agility Robotics’ Digit started being tested at Amazon and Figure announced a massive Series B funding round last week, it is now Unitree’s time to show what they bring to the growing humanoid robotics scene. Meet Unitree H1, a general-purpose humanoid robot from China that can run with a speed up to 3.3 m/s, jump, walk up and down stairs and, of course, dance. Additionally, the robot is now available for purchase for $150,000 with deliveries starting in Q1 2024. Watch an autonomous helicopter demo wildfire response skills Anyware Robotics’ Pixmo Takes Unique Approach to Trailer Unloading Robotics company Anyware Robotics joins the trailer unloading market with their Pixmo robot. Pixmo uses an off-the-shelf Fanuc robotic arm equipped with suction cups to lift boxes, which are then placed on a unique, built-in conveyor belt. This conveyor belt, which only Pixmo has, transports the boxes out of the trailer, significantly accelerating the unloading process. It can achieve a throughput of up to 1,000 boxes per hour, or approximately one box every four seconds. 🧬 BiotechnologyCultivated Biosciences poised to take its plant-based cream to market in 2025 This Swedish startup wants to reduce the cost, and controversy, around stem cells production Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it. Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human. A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support! My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!" |
Older messages
Your surgeon, a robot, will see you soon
Wednesday, March 6, 2024
How the robotic revolution promises to make surgeons more efficient and help patients recover more quickly from surgeries ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
CYBATHLON - The Olympics for Cyborgs - Weekly News Roundup - Issue #453
Monday, March 4, 2024
Plus: scammers steal $25 million with deepfakes; Bard becomes Gemini and Gemini Ultra is out; playing DOOM on cells; world's first transgenic ants; Atlas does something useful; and more!
Sam Altman asks for $7 trillion - Weekly News Roundup - Issue #454
Monday, March 4, 2024
Plus: OpenAI Sora and AI agents; ChatGPT gets memory; Gemini 1.5; "meaty" rice; more humanoid robots; glowing plants go on pre-order; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple is about to join the generative AI game - Weekly News Roundup - Issue #455
Monday, March 4, 2024
Plus: Nvidia reports record revenue; Google Gemma; Neuralink implant patient can move computer mouse by thinking, Musk says; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Figure becomes a $2.6B humanoid robotics startup - Weekly News Roundup - Issue #456
Monday, March 4, 2024
Plus: Elon Musk sues OpenAI; Google's AI makes games from text; Mistral Large; Xiaomi CyberDog 2; the first AI virus is here; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
JSK Daily for Jan 4, 2025
Saturday, January 4, 2025
JSK Daily for Jan 4, 2025 View this email in your browser A community curated daily e-mail of JavaScript news Optimizing Productivity: Integrate Salesforce with JavaScript Scheduler Syncfusion
Daily Coding Problem: Problem #1658 [Easy]
Saturday, January 4, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. You are given an array of arrays of integers, where each array corresponds to a
📱 Why You Should Buy an iPhone 15 in 2025 — This Is My Favorite AI Image Upscaler, and It’s Free
Saturday, January 4, 2025
Also: The Best Laptop Docking Stations in 2025, and More! How-To Geek Logo January 4, 2025 Did You Know There are only three live-action animals enshrined on the Hollywood Walk of Fame. All three are
Ranked | The Most Viewed Wikipedia Pages in 2024 📊
Saturday, January 4, 2025
From Kamala Harris to India's general election, politics dominated readers interests in 2024 as a historic number of people went to the polls. View Online | Subscribe | Download Our App FEATURED
Weekend Reading — DOOM x 2
Saturday, January 4, 2025
Andy P “But it is public domain” Tech Stuff Fish 4.0b1 I'm giving Fish a try. So far it's really amazing and a step up from ZSH, which itself was a step up from Bash. 4.0b1 is out, noted as “
🐍 New Python tutorials on Real Python
Saturday, January 4, 2025
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Building HTTP APIs With Django REST Framework
Smash Your New Years Goals With the Informant 5 App
Saturday, January 4, 2025
Informant 5 is a complete planner in your pocket. Manage Calendars, Tasks, Projects, and Tags in a single app. This app is one of the few that combines both your calendar AND your tasks into a singe
U.S. Sanctions Chinese Cybersecurity Firm for State-Backed Hacking Campaigns
Saturday, January 4, 2025
THN Daily Updates Newsletter cover JavaScript: Mastering JavaScript from Basics to Advanced Topics ($54.99 Value) FREE for a Limited Time This book provides a comprehensive introduction to JavaScript
📧 Unit Testing Clean Architecture Use Cases
Saturday, January 4, 2025
Unit Testing Clean Architecture Use Cases Read on: my website / Read time: 7 minutes The .NET Weekly is brought to you by: Introducing Depot Cache, the powerful way to make incremental builds up to
iOS Dev Weekly - Issue 693
Friday, January 3, 2025
Happy New Year, and here's to a cracking 2025! 🎊 View on the Web Archives ISSUE 693 January 3rd 2025 Comment Happy New Year, everyone! 🎊 I hope you all had a restful and relaxing break if you took