The Sequence Chat: Yohei Nakajima on Creating BabyAGI, Autonomous Agents and Investing in Generative AI
Was this email forwarded to you? Sign up here The Sequence Chat: Yohei Nakajima on Creating BabyAGI, Autonomous Agents and Investing in Generative AIThe creator of one of the most popular open source generative AI projects shares his views about AI tech, investing and the future.Quick bio
My name is Yohei, I was born in Japan, raised in Seattle, and went to college in California. I’ve been working with startups my whole career, initially on the community side (in LA), and on the investing side for over a decade, started with helping launch the Disney Accelerator. I became the first Director of Pipeline for Techstars before joining Scrum Ventures. This led me to starting my own VC firm Untapped Capital. Specific to AI, I’d played with a few APIs back when I was at Techstars, but this recent deep dive started in August of ‘22 a few months before ChatGPT. I’ve always been a build-to-learn kind of guy, and have done this across no code, web3, and more. Doing this publicly (build-in-public) is how I accelerate my learning, while also connecting and collaborating with founders. 🛠 AI Work
BabyAGI was project number 70 or so in a series of experiments and prototypes I’ve built with AI. The inspiration for this project was HustleGPT where people were using ChatGPT as a cofounder and doing whatever it told them to do. I wanted to experiment with taking the human element out of this and embarked on a weekend challenge to prototype and autonomous startup founder. When I shared a demo video online, people were quick to identify that this framework could be used for more - where it got the nickname BabyAGI (from my friend Jenny). You can subscribe to The Sequence for more exclusive AI research and tech content:
I’ve tried a couple of things, but what’s stood out to me in my experiments is the ability to learn over time. In the most recent modification of BabyAGI, every tasklist is analyzed alongside the output of the task list to generate a “reflection” of sorts that we store alongside the objective and tasklist. Anytime we run a new objective, we do a vector search to find the most similar past objectives, pull in the reflection notes, and write a pre-reflection note based on this that gets fed into the task list generator. On a small scale, this has worked in giving BabyAGI the ability to create better tasklist over time, even with the same objective. What I like is that this mimics our ability to improve through repetition, and the same approach could be utilized to generating code, which is more on the execution side.
Autonomous agents, especially general ones, are best suited for edge cases. For organizations, the most valuable workflows to automate are workflows that happen repeatedly, meaning there is no need for an agent to generate the task list. The reality today is that there is a ton of value organizations can gain from automation tools like Zapier, even without the use of AI. Reflecting on what I’ve seen here, I suspect the biggest obstacle to achieving widespread adoption is change in human behavior, which compounds in a complex organization with multiple stakeholders with varying incentives.
RAG is a great way to get context, but simply looking at documents is just scratching the surface. Ultimately, we’ll want our agents to be able to RAG against all human knowledge, against its previous runs, its own code, etc. More challenging today, is giving AI access to the tools it needs to execute tasks (calendar, message, etc), as it requires managing and storing authentication methods from the user, understanding how the tool is used, how the AI can use the tool (API or browsers), and in some cases adapting RAG techniques to match the data structure. One approach is building these integrations one at a time, which is more stable which means can get to market quicker - but the goal I believe should be building a system that can teach itself how to use new tools.
Candidly, this is far outside my area of expertise, but based on my observations, it does seems like the rapid experimentation on the orchestration side is slowly being embedded into the models themselves. You can almost imagine an MOE approach of three experts in a loop like BabyAGI. That being said, unsure if things like RAG or tool usage (engaging with things outside the model) can be done from the model natively… unless the model has a code sandbox within it…? Unsure. Regardless, it does feel like the effort in building better orchestration will help models improve, so I think it’s not wasted effort to experiment and explore newer and better orchestration methods.
(1) AI everywhere - we’ll see bits and pieces of AI across all apps and businesses, regardless of whether they are an AI company - similar to how most companies store data in the cloud without being a “cloud company”, (2) Passive AI - with cost going down, we’ll see increasing amount of AI just running in the background, structuring and summarizing data, generating insights, etc, and (3) AI workers - so many people around the world are spending a lot of time doing tasks that don’t require people. We’ll see lots of workflows being automated over the next decade. (4) Smaller/local/fine-tuned models - it’s still early days, but much like we went from general to personalized ads on the web, I suspect we’ll slowly start engaging with various models that are fine tuned for us specifically, and running on our phones, etc.
Candidly, I’m new to knowledge graphs so can’t speak to “high quality” knowledge graphs. I’ve had plenty of feedback that deduping is hard (it is), but in early RAG experiments on knowledge graphs, I’ve found that it can still work with non-perfect deduping. I’m curious about this approach because the data structure feels closer to how our brain is wired, so it intuitively feels like the right way to do RAG. As we (humans) experience life, we’re constantly processing and restructuring information in our minds, for more efficient recalls and storage, so it seems to make sense to me that AI would benefit from the same type of activity. I think RAG techniques against knowledge graphs, while there are some early examples, is still underexplored. 💥 Miscellaneous – a set of rapid-fire questions
I suspect we’ll see lots of workflows that have been replaced with AI, new problems that arise from this, and new roles to solve these problems. That being said, roll out won’t be immediate, as we’ll see early adopters implement the this, run into challenges, and solve them before late adopters start experimenting. This happens in stages with experiments/replacements starting small and getting larger, and then in varying speeds across different industries. In 3-5 years, I’d suspect a good number of forward thinking organizations to have a handful of AI workers who are capable of handling some tasks/workflows being done by humans today.
This one is also outside of my area of expertise but for true AGI (depending on your definition here), it seems like we’d want a model that can process multiple inputs in different modalities in parallel (audio, visual, etc) and also stream parallel outputs across modalities (audio, text, movement) at the same time. My guess is this requires some new architecture beyond what we have today.
Human beings intrinsically have both self-serving and altruistic motivations, both from an evolutionary history of survival that includes wars and tribes. In my view, the balance of open source and proprietary models reflects this duality within us, and we’ll continue to see this balance ebb and flow based on a multitude of factors from culture to economic results.
Davinci, cuz he understood the benefit of exploring the same idea through various modalities (image, text, math, etc) You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 375: Meta's System 2 Attention is a Very Unique LLM Reasoning Method
Tuesday, March 5, 2024
The method has been inspired by cognitive psychology and has immediate impact in LLM reasoning. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week
Sunday, March 3, 2024
Two papers that open new possibilities for generative AI. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
📌 You're invited to GenAI Productionize 2024
Friday, March 1, 2024
Don't miss this industry-first summit on productionizing enterprise generative AI ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 374: Some Technical Details we Learned About OpenAI's Sora
Thursday, February 29, 2024
The text-to-video model that astonished the world includes several clever engineering optimizations. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 373: Computationally Efficient LLM Reasoning with ReWOO
Tuesday, February 27, 2024
In this Issue: An overview of ReWOO as an LLM reasoning method. A review of ReWOO's research paper. An introduction to LLMFlows, a framework for building LLM applications. 💡 ML Concept of the Day:
You Might Also Like
📧 EF Core Migrations: A Detailed Guide
Saturday, May 18, 2024
EF Core Migrations: A Detailed Guide Read on: my website / Read time: 10 minutes BROUGHT TO YOU BY Low-code Framework for .NET Devs Introducing Shesha, a brand new, open-source, low-code
Slack is under attack … and you don’t want that
Friday, May 17, 2024
Plus: OpenAI is not aligned with its Superalignment team View this email online in your browser By Christine Hall Friday, May 17, 2024 Good afternoon, and welcome back to TechCrunch PM. We made it to
Ilya Sutskever leaves OpenAI - Weekly News Roundup - Issue #467
Friday, May 17, 2024
Plus: Apple is close to using ChatGPT; Microsoft builds its own LLM; China is sending a humanoid robot to space; lab-grown meat is on shelves but there is a catch; hybrid mouse/rat brains; and more! ͏
SWLW #599: Surfing through trade-offs, How to do hard things, and more.
Friday, May 17, 2024
Weekly articles & videos about people, culture and leadership: everything you need to design the org that makes the product. A weekly newsletter by Oren Ellenbogen with the best content I found
💾 There Will Never Be Another Windows XP — Why Ray Tracing is a Big Deal in Gaming
Friday, May 17, 2024
Also: What to Know About Google's Project Astra, and More! How-To Geek Logo May 17, 2024 Did You Know The very first mass-manufactured drinking straw was made of paper coated in wax; the straw was
It's the dawning of the age of AI
Friday, May 17, 2024
Plus: Musk is raging against the machine View this email online in your browser By Haje Jan Kamps Friday, May 17, 2024 Image Credits: Google Welcome to Startups Weekly — Haje's weekly recap of
Daily Coding Problem: Problem #1444 [Medium]
Friday, May 17, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Yahoo. Recall that a full binary tree is one in which each node is either a leaf node,
(Not) Sent From My iPad
Friday, May 17, 2024
The future of computing remains frustrating (Not) Sent From My iPad By MG Siegler • 17 May 2024 View in browser View in browser I tried. I really did. I tried to put together and send this newsletter
iOS Dev Weekly - Issue 661
Friday, May 17, 2024
What's the word on everyone's lips? 🅰️👁️ View on the Web Archives ISSUE 661 May 17th 2024 Comment Did you catch Google I/O this week? It's Always Interesting to see what the Android
Your Google Play recap from I/O 2024
Friday, May 17, 2024
Check out all of our latest updates and announcements Email not displaying correctly? View it online May 2024 Google Play at I/O 2024 Check out the Google Play keynote to discover the latest products