How Devin Signals the Age of AI Agent - Weekly News Roundup - Issue #458
How Devin Signals the Age of AI Agent - Weekly News Roundup - Issue #458Plus: humanoid robot understands human speech; Nvidia gets sued over AI use of copyrighted works; Mercedes-Benz will trial a humanoid robot; DeepMind SIMA; and more!Hello and welcome to Weekly News Roundup Issue #458. This was a week full of big news in the world of AI and robotics. We will take a closer look at Devin, the “first AI software engineer” and what it can tell us about the future of AI assistants and AI agents. In other news, Google DeepMind released SIMA, an AI agent playing 3D games, MEPs approve the world's first comprehensive AI law and Nvidia gets sued over AI use of copyrighted works. Meanwhile, OpenAI got into some trouble (again), and that needs a separate article to cover it properly. It was also a big week for humanoid robots - Figure has shown the fruits of their partnership with OpenAI and Mercedes-Benz will trial Apptronik. Enjoy! On March 12th, 2024, Cognition AI emerged from the stealth mode and showed to the world Devin, “the first AI software engineer”. The software engineering community had mixed reactions to this news. Some responded with fear, anxiety, or anger, while others were more excited. Let's take a closer look at Devin, what it means for the future of software engineering and what it says about the future trends in AI. According to Cognition, Devin is “the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.” Unlike other coding assistants such as GitHub Copilot, which are designed to suggest changes to code or generate blocks of code, Devin aims to build entire apps from just a text description. It takes a text input describing what needs to be built, creates a step-by-step plan for developing the requested app, and then writes the code. Cognition claims Devin is much better than other large language models such as GPT-4 or Claude 2 at solving coding tasks in SWE-Bench. Cognition describes their methodology in detail in the technical report. It would be interesting to see how Devin would compare to Claude 3 or Google Gemini. Devin is not yet available to the public, so independent verification of Cognition's claims is limited. However, the company has published a couple of videos showing different people writing working programs with Devin, showcasing Devin’s ability to use unfamiliar technologies, deploy web apps, autonomously find and fix bugs in codebases and solve real jobs on Upwork. Upon closer examination of these videos by software engineers, certain issues were noted. While not explicitly acknowledged by Cognition, it appears that Devin can take a considerable amount of time — from several minutes up to 30 minutes — to produce a response. Under the hood, Devin uses reinforcement learning on top of GPT-4 which means it very likely generates multiple possible answers and evaluates them to find the best one. In theory, this approach can result in AI models with good reasoning capabilities but it is computationally very heavy and therefore expensive. Although Cognition is a very young company, it has raised $21M to date from Peter Thiel’s Founders Fund and former Twitter executive Elad Gil and Doordash co-founder Tony Xu. Devin is the latest step in automating coding. Andrej Karpathy perfectly summarises the recent trends in writing code in this tweet. Software engineers quickly adapt tools that make them more efficient. AI coding assistants like Github Copilot quickly became part of a modern software engineering toolkit. Tools like Devin represent the next step forward in automated coding, where humans act more as supervisors who express their ideas on a high level of abstraction (for example, “write a Tinder for cats”). AI then writes the app and goes back and forth until it meets the requirements. Right now, Devin and similar tools won’t replace software engineers. However, the AI will only improve, and it is possible that soon, the AI will code faster and better than any human. Jensen Huang, the CEO of Nvidia, even said that children today shouldn't learn to code because AI will do the coding for them. Others disagree, saying there will still be a need for human software engineers. In either case, software engineering will substantially change in the next few years. Senior and experienced developers will be fine but the junior and less experienced developers will most likely be the most impacted by these new AI tools. Interestingly, Cognition is looking to hire a software engineer. One might ask why the company bothers to hire someone since they have an AI ready to do the job. Devin also represents another trend in AI - the emergence of AI agents. Last year, following the release of ChatGPT and GPT-4, some people realized the potential of asking these AI models to outline solutions for complex tasks and then executing these plans step by step. There was a brief period when projects such as AutoGPT saw a surge in popularity and then hype faded away. Now, it appears that AI agents are making a comeback. Devin serves as a perfect example of how an AI agent works - just describe the task and the AI will figure out the rest. And it is not just AI enthusiasts looking forward to AI agents. Big players on the AI scene are interested in them, too. During the first OpenAI Dev Day last year, Sam Altman mentioned that the next milestone on the path to Artificial General Intelligence (AGI) involves the development of agents: highly capable bots that can plan and execute complex tasks. Demis Hassabis, the CEO of Google DeepMind, openly speaks about applying AlphaGo-like reinforcement learning features to create models capable of reasoning. Q*, the model that triggered the OpenAI drama in November 2023, is rumoured to also follow this or a similar approach. Although Devin is not perfect and won't replace software engineers immediately, it offers a glimpse into the future. AI agents have the potential to empower individuals to accomplish tasks that would normally require a team. However, these advancements may also negatively affect the livelihoods of many. Regardless, we are on the cusp of significant changes to how we live and how we work. If you enjoy this post, please click the ❤️ button or share it. Do you like my work? Consider becoming a paying subscriber to support it For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter. Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving. 🦾 More than a humanBrain stimulation tech wins €5M to fight depression at home 🧠 Artificial IntelligenceMEPs approve world's first comprehensive AI law Nvidia is sued by authors over AI use of copyrighted works Claude 3 Haiku: our fastest model yet Last week, Anthropic released the Claude 3 family of models, the new best large language models available. However, only two out of three models were initially released. This week, Anthropic has released the final model in the Claude 3 family, named Haiku. According to benchmarks provided by Anthropic, Haiku, the smallest model in the Claude 3 family, outperforms both GPT-3.5 and Gemini 1.5 Pro across nearly all benchmarks while also being faster and cheaper than its competitors. ▶️ Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, Rogue Nations Threat (1:01:33) In this interview, Demis Hassabis, the CEO of Google DeepMind, shares why he thinks the path to AGI is a combination of large language models and reinforcement learning and why we can expect AGI to happen this decade. The other topics covered were your standard AI safety and alignment questions, how DeepMind plans to balance open source with safety and security, what’s next for DeepMind and why Hassabis is so passionate about robotics. Can Chinese companies make Sora? This Tsinghua large model team gives hope Google DeepMind SIMA - a generalist AI agent for 3D virtual environments Cerebras WSE-3 AI Chip Launched 56x Larger than NVIDIA H100 If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word? 🤖 RoboticsGot To Go Fast: The Rise Of Super-Fast FPV Drones After months of work, Luke Maximo Bell has built the world's fastest FPV drone. This drone, resembling more a mini rocket than a traditional drone, can achieve speeds of up to 401 km/h. In a drag race, the drone was faster than a Red Bull F1 car and managed to keep pace with Max Verstappen around the Silverstone - something that not everyone can do these days. ▶️ Figure Status Update - OpenAI Speech-to-Speech Reasoning (2:34) Two weeks ago, Figure announced a partnership with OpenAI. This week, the company has showcased the results of this collaboration. In a recent video, Figure 01 leverages OpenAI's models to engage in comprehensive conversations. The robot is capable of describing its surroundings, understanding commands in plain English, planning future actions, and applying common sense reasoning. Moreover, it can reflect on its memories and articulate its thought process verbally. This impressive demonstration is further proof that the era of commercial humanoid robots may be closer than we think. Mercedes begins piloting Apptronik humanoid robots ANYmal robot has a new skill: parkour ANYmal, a four-legged robot by ETH Zurich researchers, joins an elite club of robots that can do parkour. Although ANYmal sometimes lacks the grace of Boston Dynamics’ Atlas, it stands out for its ability to keep its footing on both unstable and slippery surfaces, proving itself highly capable of navigating challenging terrains like construction sites and disaster zones. This advancement not only pushes the limits of what robots can do but also prepares ANYmal for real-world applications like search and rescue missions. 🧬 Biotechnology▶️ Can We Use Bacteria to Refine Rare Earths? (14:17) Modern electronics would not be possible without rare Earth elements. However, extracting them from ore is a complex, expensive and toxic process. This video explores the idea of using bacteria to refine rare Earth elements and how these new biological methods could potentially resolve the issues associated with current chemical extraction methods. Scientists move step closer to making IVF eggs from skin cells Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it. Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human. A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support! My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!" |
Older messages
Claude 3, the new best LLM on the block - Weekly News Roundup - Issue #457
Friday, March 8, 2024
Plus: OpenAI reveals Elon's emails; Unitree's humanoid robot is available for purchase; Microsoft's engineer raises concerns about Copilot Designer and responsible AI; and more! ͏ ͏ ͏
Your surgeon, a robot, will see you soon
Wednesday, March 6, 2024
How the robotic revolution promises to make surgeons more efficient and help patients recover more quickly from surgeries ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
CYBATHLON - The Olympics for Cyborgs - Weekly News Roundup - Issue #453
Monday, March 4, 2024
Plus: scammers steal $25 million with deepfakes; Bard becomes Gemini and Gemini Ultra is out; playing DOOM on cells; world's first transgenic ants; Atlas does something useful; and more!
Sam Altman asks for $7 trillion - Weekly News Roundup - Issue #454
Monday, March 4, 2024
Plus: OpenAI Sora and AI agents; ChatGPT gets memory; Gemini 1.5; "meaty" rice; more humanoid robots; glowing plants go on pre-order; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple is about to join the generative AI game - Weekly News Roundup - Issue #455
Monday, March 4, 2024
Plus: Nvidia reports record revenue; Google Gemma; Neuralink implant patient can move computer mouse by thinking, Musk says; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
JSK Daily for Jan 4, 2025
Saturday, January 4, 2025
JSK Daily for Jan 4, 2025 View this email in your browser A community curated daily e-mail of JavaScript news Optimizing Productivity: Integrate Salesforce with JavaScript Scheduler Syncfusion
Daily Coding Problem: Problem #1658 [Easy]
Saturday, January 4, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. You are given an array of arrays of integers, where each array corresponds to a
📱 Why You Should Buy an iPhone 15 in 2025 — This Is My Favorite AI Image Upscaler, and It’s Free
Saturday, January 4, 2025
Also: The Best Laptop Docking Stations in 2025, and More! How-To Geek Logo January 4, 2025 Did You Know There are only three live-action animals enshrined on the Hollywood Walk of Fame. All three are
Ranked | The Most Viewed Wikipedia Pages in 2024 📊
Saturday, January 4, 2025
From Kamala Harris to India's general election, politics dominated readers interests in 2024 as a historic number of people went to the polls. View Online | Subscribe | Download Our App FEATURED
Weekend Reading — DOOM x 2
Saturday, January 4, 2025
Andy P “But it is public domain” Tech Stuff Fish 4.0b1 I'm giving Fish a try. So far it's really amazing and a step up from ZSH, which itself was a step up from Bash. 4.0b1 is out, noted as “
🐍 New Python tutorials on Real Python
Saturday, January 4, 2025
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Building HTTP APIs With Django REST Framework
Smash Your New Years Goals With the Informant 5 App
Saturday, January 4, 2025
Informant 5 is a complete planner in your pocket. Manage Calendars, Tasks, Projects, and Tags in a single app. This app is one of the few that combines both your calendar AND your tasks into a singe
U.S. Sanctions Chinese Cybersecurity Firm for State-Backed Hacking Campaigns
Saturday, January 4, 2025
THN Daily Updates Newsletter cover JavaScript: Mastering JavaScript from Basics to Advanced Topics ($54.99 Value) FREE for a Limited Time This book provides a comprehensive introduction to JavaScript
📧 Unit Testing Clean Architecture Use Cases
Saturday, January 4, 2025
Unit Testing Clean Architecture Use Cases Read on: my website / Read time: 7 minutes The .NET Weekly is brought to you by: Introducing Depot Cache, the powerful way to make incremental builds up to
iOS Dev Weekly - Issue 693
Friday, January 3, 2025
Happy New Year, and here's to a cracking 2025! 🎊 View on the Web Archives ISSUE 693 January 3rd 2025 Comment Happy New Year, everyone! 🎊 I hope you all had a restful and relaxing break if you took