Claude 3, the new best LLM on the block - Weekly News Roundup - Issue #457
Claude 3, the new best LLM on the block - Weekly News Roundup - Issue #457Plus: OpenAI reveals Elon's emails; Unitree's humanoid robot is available for purchase; Microsoft’s engineer raises concerns about Copilot Designer and responsible AI; and more!Welcome to Weekly News Roundup Issue #457. This week, Anthropic released Claude 3, probably the best large language model available today. In other news, OpenAI responded to Elon Musk's lawsuit by publishing emails between Musk and the OpenAI founders. A Microsoft engineer has raised concerns about Copilot Designer and responsible AI. Unitree's humanoid robot is now available for purchase, and more! Since its release almost a year ago, in mid-March 2023, GPT-4 was the top large language model available. Every other model released since then has been compared to GPT-4. There was a gap in capabilities between OpenAI’s top model and its competitors. However, as 2023 progressed, other companies began to close that gap, inching closer to the mark set by GPT-4, with some models even surpassing GPT-4 in certain benchmarks. Recently, the competition has caught up with OpenAI. First, there was Google DeepMind and its Gemini models. This week, Anthropic released its latest model, Claude 3, which claims to be the most intelligent large language model currently available and sets new industry benchmarks across a wide range of cognitive tasks. Let's investigate these claims and evaluate how intelligent Claude 3 really is. Similarly to Google Gemini, Claude 3 is not a single model but a family of three models, which are named, in ascending order of capability, Opus, Sonnet and Haiku. Anthropic says that Claude 3 Opus set “a new standard for intelligence” and that it shows us “the outer limits of what’s possible with generative AI”. The company backs these claims with the benchmark results comparing all Claude 3 models with their main competitors - GPT-4, GPT-3.5, Gemini 1.0 Ultra and Gemini 1.0 Pro. The results are quite impressive. However, what’s missing from this table are results from GPT-4 Turbo or Gemini 1.5 Pro. The reason for missing benchmarks from GPT-4 Turbo is that they have not been made publicly available. However, someone on Twitter shared unofficial GPT-4 Turbo benchmarks and they show that GPT-4 Turbo scores higher than Claude 3 Opus. The comparison to Gemini 1.5 Pro is provided in a paper describing the model and benchmark results in more detail and Claude 3 Opus emerges as a better model here. Those are results provided by Anthropic, so let’s take them with a pinch of salt as the company wants to present itself and Claude 3 in the best light possible to attract as many customers as possible. Over the next days and weeks, independent tests will be available and we will have a better picture of how Claude 3 compares in real-life applications to its competitors. One such comparison was made by AI Explained, who compared Claude 3 Opus with GPT-4 Turbo and Gemini 1.5 Pro across multiple queries and tested how good Claude 3 is at understanding images and reasoning. AI Explained concludes that Claude 3 Opus is “probably the most intelligent model currently available”. It is better at understanding images than other models and it is better at reasoning and answering even tricky questions. Claude 3 Opus also has lower rates of false refusal. AI Explained tested that by asking how to make a party “go down like a bomb”. Only Claude 3 understood the true meaning of the question and provided an answer. Each of the Claude 3 models has a context window of 200K tokens. However, all three models are capable of accepting inputs exceeding 1 million tokens. The larger context window is currently limited to selected customers who need enhanced processing power. Anthropic also showed that Claude 3 has an excellent recall accuracy across a 200K token context window, meaning it can recall any information no matter where in the input text it was. While we are discussing the Claude 3 context window, one of Anthropic’s employees shared an interesting story about internal testing on Claude 3 Opus in which the model suspected it was being evaluated. Some people take stories like these as proof that Claude 3 is sentient but the same behaviour can be explained by the way large language models work, how Claude 3 was trained and how it was conditioned to behave, as Yannic Kilcher explains in this video. Opus and Sonnet are available through Anthropic’s API, and Haiku will be available soon. Sonnet can be accessed for free on claude.ai, while Opus is available for Claude Pro subscribers. Sonnet is also available through Amazon Bedrock and in private preview on Google Cloud's Vertex AI Model Garden—with Opus and Haiku coming soon to both platforms. Overall, Claude 3 Opus is a very good model, possibly the best large language model available today. It shapes to be a good alternative to what OpenAI and Google have to offer and justifies Anthropic’s $18 billion valuation. It will be interesting to see where the AI industry goes from here. Anthropic does not believe Claude 3 is “anywhere near its limits” and promises to release frequent updates in the coming months. Meanwhile, other companies won’t stay still. Google might release Gemini 1.5 Ultra and jump over Claude 3. Meta is currently training Llama 3. And if rumours are true, OpenAI is already training what could be GPT-5, projected to be released in the second half of the year. In the meantime, OpenAI could release partially trained GPT-5 as GPT-4.5 to challenge other models and regain the top spot. The next few months are going to be interesting in the AI space. If you enjoy this post, please click the ❤️ button or share it. Do you like my work? Consider becoming a paying subscriber to support it For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter. Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving. 🦾 More than a human3D-printed skin closes wounds and contains hair follicle precursors 🧠 Artificial IntelligenceOpenAI and Elon Musk Microsoft’s engineer raises concerns about Copilot Designer and responsible AI Inflection-2.5: meet the world's best personal AI Public trust in AI is sinking across the board Researchers jailbreak AI chatbots with ASCII art - ArtPrompt bypasses safety measures to unlock malicious queries If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word? 🤖 Robotics▶️ Unitree H1 Breaking humanoid robot speed world record (1:15) After Agility Robotics’ Digit started being tested at Amazon and Figure announced a massive Series B funding round last week, it is now Unitree’s time to show what they bring to the growing humanoid robotics scene. Meet Unitree H1, a general-purpose humanoid robot from China that can run with a speed up to 3.3 m/s, jump, walk up and down stairs and, of course, dance. Additionally, the robot is now available for purchase for $150,000 with deliveries starting in Q1 2024. Watch an autonomous helicopter demo wildfire response skills Anyware Robotics’ Pixmo Takes Unique Approach to Trailer Unloading Robotics company Anyware Robotics joins the trailer unloading market with their Pixmo robot. Pixmo uses an off-the-shelf Fanuc robotic arm equipped with suction cups to lift boxes, which are then placed on a unique, built-in conveyor belt. This conveyor belt, which only Pixmo has, transports the boxes out of the trailer, significantly accelerating the unloading process. It can achieve a throughput of up to 1,000 boxes per hour, or approximately one box every four seconds. 🧬 BiotechnologyCultivated Biosciences poised to take its plant-based cream to market in 2025 This Swedish startup wants to reduce the cost, and controversy, around stem cells production Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it. Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human. A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support! My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!" |
Older messages
Your surgeon, a robot, will see you soon
Wednesday, March 6, 2024
How the robotic revolution promises to make surgeons more efficient and help patients recover more quickly from surgeries ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
CYBATHLON - The Olympics for Cyborgs - Weekly News Roundup - Issue #453
Monday, March 4, 2024
Plus: scammers steal $25 million with deepfakes; Bard becomes Gemini and Gemini Ultra is out; playing DOOM on cells; world's first transgenic ants; Atlas does something useful; and more!
Sam Altman asks for $7 trillion - Weekly News Roundup - Issue #454
Monday, March 4, 2024
Plus: OpenAI Sora and AI agents; ChatGPT gets memory; Gemini 1.5; "meaty" rice; more humanoid robots; glowing plants go on pre-order; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple is about to join the generative AI game - Weekly News Roundup - Issue #455
Monday, March 4, 2024
Plus: Nvidia reports record revenue; Google Gemma; Neuralink implant patient can move computer mouse by thinking, Musk says; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Figure becomes a $2.6B humanoid robotics startup - Weekly News Roundup - Issue #456
Monday, March 4, 2024
Plus: Elon Musk sues OpenAI; Google's AI makes games from text; Mistral Large; Xiaomi CyberDog 2; the first AI virus is here; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Daily Coding Problem: Problem #1619 [Hard]
Monday, November 25, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given two non-empty binary trees s and t , check whether tree t has exactly the
Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow
Monday, November 25, 2024
Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the
What Investors Want From AI Startups in 2025
Monday, November 25, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon
GCP Newsletter #426
Monday, November 25, 2024
Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's
⏳ 36 Hours Left: Help Get "The Art of Data" Across the Finish Line 🏁
Monday, November 25, 2024
Visual Capitalist plans to unveal its secrets behind data storytelling, but only if the book hits its minimum funding goal. View Online | Subscribe | Download Our App We Need Your Help Only 36 Hours
DeveloPassion's Newsletter #180 - Black Friday Week
Monday, November 25, 2024
Edition 180 of my newsletter, discussing Knowledge Management, Knowledge Work, Zen Productivity, Personal Organization, and more! Sébastien Dubois DeveloPassion's Newsletter DeveloPassion's
Meet HackerNoon's Latest Features: Boost Stories with Translations, Speech-to-Text & More
Monday, November 25, 2024
Hey, Hacker! HackerNoon's monthly product update is here! Get ready for a new version of the mobile app, more translation developments, a new AI Gallery, backend moves, and more! 🚀 This product
The ultimate holiday gadget gift
Monday, November 25, 2024
AI isn't hitting a wall; $70 off Apple Watch; 60+ Amazon deals -- ZDNET ZDNET Tech Today - US November 25, 2024 Meta Quest 3S Why the Meta Quest 3S is the ultimate 2024 holiday present This $299
Deduplication in Distributed Systems: Myths, Realities, and Practical Solutions
Monday, November 25, 2024
This week, we'll discuss the deduplication strategies. We'll see whether they're useful and consider scenarios where you may need them. We'll also do a reality check with the promises
How to know if your data has been exposed
Monday, November 25, 2024
How do you know if your personal data has been leaked? Imagine getting an instant notification if your SSN, credit card, or password has been exposed on the dark web — so you can take action