The Sequence Chat: Yohei Nakajima on Creating BabyAGI, Autonomous Agents and Investing in Generative AI
Was this email forwarded to you? Sign up here The Sequence Chat: Yohei Nakajima on Creating BabyAGI, Autonomous Agents and Investing in Generative AIThe creator of one of the most popular open source generative AI projects shares his views about AI tech, investing and the future.Quick bio
My name is Yohei, I was born in Japan, raised in Seattle, and went to college in California. I’ve been working with startups my whole career, initially on the community side (in LA), and on the investing side for over a decade, started with helping launch the Disney Accelerator. I became the first Director of Pipeline for Techstars before joining Scrum Ventures. This led me to starting my own VC firm Untapped Capital. Specific to AI, I’d played with a few APIs back when I was at Techstars, but this recent deep dive started in August of ‘22 a few months before ChatGPT. I’ve always been a build-to-learn kind of guy, and have done this across no code, web3, and more. Doing this publicly (build-in-public) is how I accelerate my learning, while also connecting and collaborating with founders. 🛠 AI Work
BabyAGI was project number 70 or so in a series of experiments and prototypes I’ve built with AI. The inspiration for this project was HustleGPT where people were using ChatGPT as a cofounder and doing whatever it told them to do. I wanted to experiment with taking the human element out of this and embarked on a weekend challenge to prototype and autonomous startup founder. When I shared a demo video online, people were quick to identify that this framework could be used for more - where it got the nickname BabyAGI (from my friend Jenny). You can subscribe to The Sequence for more exclusive AI research and tech content:
I’ve tried a couple of things, but what’s stood out to me in my experiments is the ability to learn over time. In the most recent modification of BabyAGI, every tasklist is analyzed alongside the output of the task list to generate a “reflection” of sorts that we store alongside the objective and tasklist. Anytime we run a new objective, we do a vector search to find the most similar past objectives, pull in the reflection notes, and write a pre-reflection note based on this that gets fed into the task list generator. On a small scale, this has worked in giving BabyAGI the ability to create better tasklist over time, even with the same objective. What I like is that this mimics our ability to improve through repetition, and the same approach could be utilized to generating code, which is more on the execution side.
Autonomous agents, especially general ones, are best suited for edge cases. For organizations, the most valuable workflows to automate are workflows that happen repeatedly, meaning there is no need for an agent to generate the task list. The reality today is that there is a ton of value organizations can gain from automation tools like Zapier, even without the use of AI. Reflecting on what I’ve seen here, I suspect the biggest obstacle to achieving widespread adoption is change in human behavior, which compounds in a complex organization with multiple stakeholders with varying incentives.
RAG is a great way to get context, but simply looking at documents is just scratching the surface. Ultimately, we’ll want our agents to be able to RAG against all human knowledge, against its previous runs, its own code, etc. More challenging today, is giving AI access to the tools it needs to execute tasks (calendar, message, etc), as it requires managing and storing authentication methods from the user, understanding how the tool is used, how the AI can use the tool (API or browsers), and in some cases adapting RAG techniques to match the data structure. One approach is building these integrations one at a time, which is more stable which means can get to market quicker - but the goal I believe should be building a system that can teach itself how to use new tools.
Candidly, this is far outside my area of expertise, but based on my observations, it does seems like the rapid experimentation on the orchestration side is slowly being embedded into the models themselves. You can almost imagine an MOE approach of three experts in a loop like BabyAGI. That being said, unsure if things like RAG or tool usage (engaging with things outside the model) can be done from the model natively… unless the model has a code sandbox within it…? Unsure. Regardless, it does feel like the effort in building better orchestration will help models improve, so I think it’s not wasted effort to experiment and explore newer and better orchestration methods.
(1) AI everywhere - we’ll see bits and pieces of AI across all apps and businesses, regardless of whether they are an AI company - similar to how most companies store data in the cloud without being a “cloud company”, (2) Passive AI - with cost going down, we’ll see increasing amount of AI just running in the background, structuring and summarizing data, generating insights, etc, and (3) AI workers - so many people around the world are spending a lot of time doing tasks that don’t require people. We’ll see lots of workflows being automated over the next decade. (4) Smaller/local/fine-tuned models - it’s still early days, but much like we went from general to personalized ads on the web, I suspect we’ll slowly start engaging with various models that are fine tuned for us specifically, and running on our phones, etc.
Candidly, I’m new to knowledge graphs so can’t speak to “high quality” knowledge graphs. I’ve had plenty of feedback that deduping is hard (it is), but in early RAG experiments on knowledge graphs, I’ve found that it can still work with non-perfect deduping. I’m curious about this approach because the data structure feels closer to how our brain is wired, so it intuitively feels like the right way to do RAG. As we (humans) experience life, we’re constantly processing and restructuring information in our minds, for more efficient recalls and storage, so it seems to make sense to me that AI would benefit from the same type of activity. I think RAG techniques against knowledge graphs, while there are some early examples, is still underexplored. 💥 Miscellaneous – a set of rapid-fire questions
I suspect we’ll see lots of workflows that have been replaced with AI, new problems that arise from this, and new roles to solve these problems. That being said, roll out won’t be immediate, as we’ll see early adopters implement the this, run into challenges, and solve them before late adopters start experimenting. This happens in stages with experiments/replacements starting small and getting larger, and then in varying speeds across different industries. In 3-5 years, I’d suspect a good number of forward thinking organizations to have a handful of AI workers who are capable of handling some tasks/workflows being done by humans today.
This one is also outside of my area of expertise but for true AGI (depending on your definition here), it seems like we’d want a model that can process multiple inputs in different modalities in parallel (audio, visual, etc) and also stream parallel outputs across modalities (audio, text, movement) at the same time. My guess is this requires some new architecture beyond what we have today.
Human beings intrinsically have both self-serving and altruistic motivations, both from an evolutionary history of survival that includes wars and tribes. In my view, the balance of open source and proprietary models reflects this duality within us, and we’ll continue to see this balance ebb and flow based on a multitude of factors from culture to economic results.
Davinci, cuz he understood the benefit of exploring the same idea through various modalities (image, text, math, etc) You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 375: Meta's System 2 Attention is a Very Unique LLM Reasoning Method
Tuesday, March 5, 2024
The method has been inspired by cognitive psychology and has immediate impact in LLM reasoning. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Text-to-Video Games and 1-Bit Models: Two Monumental Generative AI Research Milestones in One Week
Sunday, March 3, 2024
Two papers that open new possibilities for generative AI. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
📌 You're invited to GenAI Productionize 2024
Friday, March 1, 2024
Don't miss this industry-first summit on productionizing enterprise generative AI ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 374: Some Technical Details we Learned About OpenAI's Sora
Thursday, February 29, 2024
The text-to-video model that astonished the world includes several clever engineering optimizations. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Edge 373: Computationally Efficient LLM Reasoning with ReWOO
Tuesday, February 27, 2024
In this Issue: An overview of ReWOO as an LLM reasoning method. A review of ReWOO's research paper. An introduction to LLMFlows, a framework for building LLM applications. 💡 ML Concept of the Day:
You Might Also Like
Re: Hackers may have stolen everyone's SSN!
Saturday, November 23, 2024
I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for
North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedIn
Saturday, November 23, 2024
THN Daily Updates Newsletter cover Generative AI For Dummies ($18.00 Value) FREE for a Limited Time Generate a personal assistant with generative AI Download Now Sponsored LATEST NEWS Nov 23, 2024
📧 Building Async APIs in ASP.NET Core - The Right Way
Saturday, November 23, 2024
Building Async APIs in ASP .NET Core - The Right Way Read on: my website / Read time: 5 minutes The .NET Weekly is brought to you by: Even the smartest AI in the world won't save you from a
WebAIM November 2024 Newsletter
Friday, November 22, 2024
WebAIM November 2024 Newsletter Read this newsletter online at https://webaim.org/newsletter/2024/november Features Using Severity Ratings to Prioritize Web Accessibility Remediation When it comes to
➡️ Why Your Phone Doesn't Want You to Sideload Apps — Setting the Default Gateway in Linux
Friday, November 22, 2024
Also: Hey Apple, It's Time to Upgrade the Macs Storage, and More! How-To Geek Logo November 22, 2024 Did You Know Fantasy author JRR Tolkien is credited with inventing the main concept of orcs and
JSK Daily for Nov 22, 2024
Friday, November 22, 2024
JSK Daily for Nov 22, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Spyglass Dispatch: The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen
Friday, November 22, 2024
The Fate of Chrome • Amazon Tops Up Anthropic • Pros Quit Xitter • Brave Powers AI Search • Apple's Lazy AI River • RIP Enrique Allen The Spyglass Dispatch is a free newsletter sent out daily on
Charted | How the Global Distribution of Wealth Has Changed (2000-2023) 💰
Friday, November 22, 2024
This graphic illustrates the shifts in global wealth distribution between 2000 and 2023. View Online | Subscribe | Download Our App Presented by: MSCI >> Get the Free Investor Guide Now FEATURED
Daily Coding Problem: Problem #1616 [Easy]
Friday, November 22, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Alibaba. Given an even number (greater than 2), return two prime numbers whose sum will
The problem to solve
Friday, November 22, 2024
Use problem framing to define the problem to solve This week, Tom Parson and Krishna Raha share tools and frameworks to identify and address challenges effectively, while Voltage Control highlights