How Devin Signals the Age of AI Agent - Weekly News Roundup - Issue #458
How Devin Signals the Age of AI Agent - Weekly News Roundup - Issue #458Plus: humanoid robot understands human speech; Nvidia gets sued over AI use of copyrighted works; Mercedes-Benz will trial a humanoid robot; DeepMind SIMA; and more!Hello and welcome to Weekly News Roundup Issue #458. This was a week full of big news in the world of AI and robotics. We will take a closer look at Devin, the “first AI software engineer” and what it can tell us about the future of AI assistants and AI agents. In other news, Google DeepMind released SIMA, an AI agent playing 3D games, MEPs approve the world's first comprehensive AI law and Nvidia gets sued over AI use of copyrighted works. Meanwhile, OpenAI got into some trouble (again), and that needs a separate article to cover it properly. It was also a big week for humanoid robots - Figure has shown the fruits of their partnership with OpenAI and Mercedes-Benz will trial Apptronik. Enjoy! On March 12th, 2024, Cognition AI emerged from the stealth mode and showed to the world Devin, “the first AI software engineer”. The software engineering community had mixed reactions to this news. Some responded with fear, anxiety, or anger, while others were more excited. Let's take a closer look at Devin, what it means for the future of software engineering and what it says about the future trends in AI. According to Cognition, Devin is “the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.” Unlike other coding assistants such as GitHub Copilot, which are designed to suggest changes to code or generate blocks of code, Devin aims to build entire apps from just a text description. It takes a text input describing what needs to be built, creates a step-by-step plan for developing the requested app, and then writes the code. Cognition claims Devin is much better than other large language models such as GPT-4 or Claude 2 at solving coding tasks in SWE-Bench. Cognition describes their methodology in detail in the technical report. It would be interesting to see how Devin would compare to Claude 3 or Google Gemini. Devin is not yet available to the public, so independent verification of Cognition's claims is limited. However, the company has published a couple of videos showing different people writing working programs with Devin, showcasing Devin’s ability to use unfamiliar technologies, deploy web apps, autonomously find and fix bugs in codebases and solve real jobs on Upwork. Upon closer examination of these videos by software engineers, certain issues were noted. While not explicitly acknowledged by Cognition, it appears that Devin can take a considerable amount of time — from several minutes up to 30 minutes — to produce a response. Under the hood, Devin uses reinforcement learning on top of GPT-4 which means it very likely generates multiple possible answers and evaluates them to find the best one. In theory, this approach can result in AI models with good reasoning capabilities but it is computationally very heavy and therefore expensive. Although Cognition is a very young company, it has raised $21M to date from Peter Thiel’s Founders Fund and former Twitter executive Elad Gil and Doordash co-founder Tony Xu. Devin is the latest step in automating coding. Andrej Karpathy perfectly summarises the recent trends in writing code in this tweet. Software engineers quickly adapt tools that make them more efficient. AI coding assistants like Github Copilot quickly became part of a modern software engineering toolkit. Tools like Devin represent the next step forward in automated coding, where humans act more as supervisors who express their ideas on a high level of abstraction (for example, “write a Tinder for cats”). AI then writes the app and goes back and forth until it meets the requirements. Right now, Devin and similar tools won’t replace software engineers. However, the AI will only improve, and it is possible that soon, the AI will code faster and better than any human. Jensen Huang, the CEO of Nvidia, even said that children today shouldn't learn to code because AI will do the coding for them. Others disagree, saying there will still be a need for human software engineers. In either case, software engineering will substantially change in the next few years. Senior and experienced developers will be fine but the junior and less experienced developers will most likely be the most impacted by these new AI tools. Interestingly, Cognition is looking to hire a software engineer. One might ask why the company bothers to hire someone since they have an AI ready to do the job. Devin also represents another trend in AI - the emergence of AI agents. Last year, following the release of ChatGPT and GPT-4, some people realized the potential of asking these AI models to outline solutions for complex tasks and then executing these plans step by step. There was a brief period when projects such as AutoGPT saw a surge in popularity and then hype faded away. Now, it appears that AI agents are making a comeback. Devin serves as a perfect example of how an AI agent works - just describe the task and the AI will figure out the rest. And it is not just AI enthusiasts looking forward to AI agents. Big players on the AI scene are interested in them, too. During the first OpenAI Dev Day last year, Sam Altman mentioned that the next milestone on the path to Artificial General Intelligence (AGI) involves the development of agents: highly capable bots that can plan and execute complex tasks. Demis Hassabis, the CEO of Google DeepMind, openly speaks about applying AlphaGo-like reinforcement learning features to create models capable of reasoning. Q*, the model that triggered the OpenAI drama in November 2023, is rumoured to also follow this or a similar approach. Although Devin is not perfect and won't replace software engineers immediately, it offers a glimpse into the future. AI agents have the potential to empower individuals to accomplish tasks that would normally require a team. However, these advancements may also negatively affect the livelihoods of many. Regardless, we are on the cusp of significant changes to how we live and how we work. If you enjoy this post, please click the ❤️ button or share it. Do you like my work? Consider becoming a paying subscriber to support it For those who prefer to make a one-off donation, you can 'buy me a coffee' via Ko-fi. Every coffee bought is a generous support towards the work put into this newsletter. Your support, in any form, is deeply appreciated and goes a long way in keeping this newsletter alive and thriving. 🦾 More than a humanBrain stimulation tech wins €5M to fight depression at home 🧠 Artificial IntelligenceMEPs approve world's first comprehensive AI law Nvidia is sued by authors over AI use of copyrighted works Claude 3 Haiku: our fastest model yet Last week, Anthropic released the Claude 3 family of models, the new best large language models available. However, only two out of three models were initially released. This week, Anthropic has released the final model in the Claude 3 family, named Haiku. According to benchmarks provided by Anthropic, Haiku, the smallest model in the Claude 3 family, outperforms both GPT-3.5 and Gemini 1.5 Pro across nearly all benchmarks while also being faster and cheaper than its competitors. ▶️ Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, Rogue Nations Threat (1:01:33) In this interview, Demis Hassabis, the CEO of Google DeepMind, shares why he thinks the path to AGI is a combination of large language models and reinforcement learning and why we can expect AGI to happen this decade. The other topics covered were your standard AI safety and alignment questions, how DeepMind plans to balance open source with safety and security, what’s next for DeepMind and why Hassabis is so passionate about robotics. Can Chinese companies make Sora? This Tsinghua large model team gives hope Google DeepMind SIMA - a generalist AI agent for 3D virtual environments Cerebras WSE-3 AI Chip Launched 56x Larger than NVIDIA H100 If you're enjoying the insights and perspectives shared in the Humanity Redefined newsletter, why not spread the word? 🤖 RoboticsGot To Go Fast: The Rise Of Super-Fast FPV Drones After months of work, Luke Maximo Bell has built the world's fastest FPV drone. This drone, resembling more a mini rocket than a traditional drone, can achieve speeds of up to 401 km/h. In a drag race, the drone was faster than a Red Bull F1 car and managed to keep pace with Max Verstappen around the Silverstone - something that not everyone can do these days. ▶️ Figure Status Update - OpenAI Speech-to-Speech Reasoning (2:34) Two weeks ago, Figure announced a partnership with OpenAI. This week, the company has showcased the results of this collaboration. In a recent video, Figure 01 leverages OpenAI's models to engage in comprehensive conversations. The robot is capable of describing its surroundings, understanding commands in plain English, planning future actions, and applying common sense reasoning. Moreover, it can reflect on its memories and articulate its thought process verbally. This impressive demonstration is further proof that the era of commercial humanoid robots may be closer than we think. Mercedes begins piloting Apptronik humanoid robots ANYmal robot has a new skill: parkour ANYmal, a four-legged robot by ETH Zurich researchers, joins an elite club of robots that can do parkour. Although ANYmal sometimes lacks the grace of Boston Dynamics’ Atlas, it stands out for its ability to keep its footing on both unstable and slippery surfaces, proving itself highly capable of navigating challenging terrains like construction sites and disaster zones. This advancement not only pushes the limits of what robots can do but also prepares ANYmal for real-world applications like search and rescue missions. 🧬 Biotechnology▶️ Can We Use Bacteria to Refine Rare Earths? (14:17) Modern electronics would not be possible without rare Earth elements. However, extracting them from ore is a complex, expensive and toxic process. This video explores the idea of using bacteria to refine rare Earth elements and how these new biological methods could potentially resolve the issues associated with current chemical extraction methods. Scientists move step closer to making IVF eggs from skin cells Thanks for reading. If you enjoyed this post, please click the ❤️ button or share it. Humanity Redefined sheds light on the bleeding edge of technology and how advancements in AI, robotics, and biotech can usher in abundance, expand humanity's horizons, and redefine what it means to be human. A big thank you to my paid subscribers, to my Patrons: whmr, Florian, dux, Eric, Preppikoma and Andrew, and to everyone who supports my work on Ko-Fi. Thank you for the support! My DMs are open to all subscribers. Feel free to drop me a message, share feedback, or just say "hi!" |
Older messages
Claude 3, the new best LLM on the block - Weekly News Roundup - Issue #457
Friday, March 8, 2024
Plus: OpenAI reveals Elon's emails; Unitree's humanoid robot is available for purchase; Microsoft's engineer raises concerns about Copilot Designer and responsible AI; and more! ͏ ͏ ͏
Your surgeon, a robot, will see you soon
Wednesday, March 6, 2024
How the robotic revolution promises to make surgeons more efficient and help patients recover more quickly from surgeries ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
CYBATHLON - The Olympics for Cyborgs - Weekly News Roundup - Issue #453
Monday, March 4, 2024
Plus: scammers steal $25 million with deepfakes; Bard becomes Gemini and Gemini Ultra is out; playing DOOM on cells; world's first transgenic ants; Atlas does something useful; and more!
Sam Altman asks for $7 trillion - Weekly News Roundup - Issue #454
Monday, March 4, 2024
Plus: OpenAI Sora and AI agents; ChatGPT gets memory; Gemini 1.5; "meaty" rice; more humanoid robots; glowing plants go on pre-order; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Apple is about to join the generative AI game - Weekly News Roundup - Issue #455
Monday, March 4, 2024
Plus: Nvidia reports record revenue; Google Gemma; Neuralink implant patient can move computer mouse by thinking, Musk says; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
You Might Also Like
Daily Coding Problem: Problem #1619 [Hard]
Monday, November 25, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given two non-empty binary trees s and t , check whether tree t has exactly the
Unpacking “Craft” in the Software Interface & The Five Pillars of Creative Flow
Monday, November 25, 2024
Systems Over Substance, Anytype's autumn updates, Ghost's progress with its ActivityPub integration, and a lot more in this week's issue of Creativerly. Creativerly Unpacking “Craft” in the
What Investors Want From AI Startups in 2025
Monday, November 25, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 25, 2024? The HackerNoon
GCP Newsletter #426
Monday, November 25, 2024
Welcome to issue #426 November 25th, 2024 News LLM Official Blog Vertex AI Announcing Mistral AI's Large-Instruct-2411 on Vertex AI - Google Cloud has announced the availability of Mistral AI's
⏳ 36 Hours Left: Help Get "The Art of Data" Across the Finish Line 🏁
Monday, November 25, 2024
Visual Capitalist plans to unveal its secrets behind data storytelling, but only if the book hits its minimum funding goal. View Online | Subscribe | Download Our App We Need Your Help Only 36 Hours
DeveloPassion's Newsletter #180 - Black Friday Week
Monday, November 25, 2024
Edition 180 of my newsletter, discussing Knowledge Management, Knowledge Work, Zen Productivity, Personal Organization, and more! Sébastien Dubois DeveloPassion's Newsletter DeveloPassion's
Meet HackerNoon's Latest Features: Boost Stories with Translations, Speech-to-Text & More
Monday, November 25, 2024
Hey, Hacker! HackerNoon's monthly product update is here! Get ready for a new version of the mobile app, more translation developments, a new AI Gallery, backend moves, and more! 🚀 This product
The ultimate holiday gadget gift
Monday, November 25, 2024
AI isn't hitting a wall; $70 off Apple Watch; 60+ Amazon deals -- ZDNET ZDNET Tech Today - US November 25, 2024 Meta Quest 3S Why the Meta Quest 3S is the ultimate 2024 holiday present This $299
Deduplication in Distributed Systems: Myths, Realities, and Practical Solutions
Monday, November 25, 2024
This week, we'll discuss the deduplication strategies. We'll see whether they're useful and consider scenarios where you may need them. We'll also do a reality check with the promises
How to know if your data has been exposed
Monday, November 25, 2024
How do you know if your personal data has been leaked? Imagine getting an instant notification if your SSN, credit card, or password has been exposed on the dark web — so you can take action