The Sequence Chat: Doug Burger- Technical Fellow, Microsoft Research About Building Autonomous Agents, AutoGen and…
Was this email forwarded to you? Sign up here The Sequence Chat: Doug Burger- Technical Fellow, Microsoft Research About Building Autonomous Agents, AutoGen and the Future of Generative AIOne of the members of the AutoGen team shares insights about its vision, architecture and the future of autonomous agents.Quick bio
I started out as a researcher in computer architecture, focusing on CPU and memory systems architecture. I spent a decade as an academic (Computer Sciences professor at the University of Texas at Austin) co-leading a DARPA-funded research program there. In 2008 I joined Microsoft and spent a decade working in Microsoft Research across several areas, including computer architecture, AI, computing systems, and reconfigurable computing. After that I shifted to Azure with the founding of the Azure Hardware Systems group, and served as a product executive building AI supercomputers. A bit under a year ago I moved back to Microsoft Research to take on a leadership role, and help evolve the organization to meet the challenge of the new AI era. I’ve been interested in AI for a long time, but didn’t work in it as a core area until about 2015. I co-authored my first AI-related paper in 2000 (using neural networks to manage on-CPU hardware resources). My colleagues and I did other explorations on the boundaries of AI and computer architecture over the next 15 years, including analog neural networks for low-power branch prediction (2008), using neural networks to replace conventional code running on CPUs (2012), and building analog neural network accelerators into CPUs to offload that code (2014). In 2013, my team started exploring building neural network accelerators on FPGAs, and in 2015 we ramped that effort to take it to production as Project Brainwave, which shipped into production at large scale in 2017, accelerating neural network inference for Bing and Office. In 2018, when I moved into Azure, my team started building custom AI supercomputers at scale. Some of that work involved deep algorithmic work, which culminated in this year’s announcement of the MX consortium, which is standardized 4, 6, and 8-bit datatypes for ultra-efficient AI computation. 🛠 AI Work
I’m not actually one of the co-creators. I’ve been working closely with Chi Wang, Gagan Bansal, and Ahmed Adawallah on the AutoGen team. When I met them last summer, I realized the importance of this area and what they were building, and started meeting with the team weekly to see how I could help. The potential of this area (and project) is to uplevel the capabilities of AI working with humans, allowing everyone to achieve more, which is in line with Microsoft’s mission as a company. The team built a beautiful library that made multi-agent orchestration, and using human feedback in the loop, simple and powerful. That's one reason for its great success and growing popularity on GitHub. The thing that I wanted to see was a scientific study of why different patterns worked well or poorly. What combination of agent capabilities would be most effective at solving tasks. How would you know that a particular subtask was complete and correct, without a human monitoring each agent interaction? In the open-source community, there are huge numbers of people leveraging AutoGen in creative ways, and solving surprising problems. One pattern that we see as fundamental is the "generator+critic" pattern, where one agent generates content (writing, code, etc.) and another agent critiques it (finds bugs, etc.) They can iterate until the solution is correct. Or find problems in the environment and automatically install the needed packages for the code it's generated to run. Right now, the AutoGen user has the option to be in the loop for any interaction, which is super valuable as we are figuring out how to apply this technology to successively larger problems. Ultimately, we will want more of an agent communication graph to be "closed loop", meaning that humans don't need to review the generated information, and can interact in an open loop fashion at a higher level (for example, refining specifications where they are imprecise, or giving feedback about how the agents are interacting if it can't make forward progress). Ultimately, graphs of human collaborations and graphs of agent collaborations will be interleaved, and I think we'll find that the ideal interleavings will be quite surprising. The two most important problems (at least how I am thinking about them currently), are:
AutoGen has three core components that contributed to its success early on. First, it is incredibly simple and lightweight; setting up multiple agents interacting through "conversational programming" is straightforward. That ease of use allows people to get up and running quickly. The abstractions AutoGen provides are fully general, but they can be realized in useful specific ways: the notion of a "User Proxy Agent" allows people to choose to be an agent in the graph, intercept messages, provide their own, or allow the agents to run. That capability greatly simplifies the ability to keep things on track, especially in the early days when we don't know how to ground the conversations and inter-agent collaborations fully. Third, AutoGen’s flexible topologies allow for arbitrary and creative organizations of agent graphs. An example is AutoGen’s Group Chat Manager, which allows an arbitrary set of agents to participate in a chat. The Group Chat Manager selects the agent that seems like the best choice to respond at each iteration of the conversation. This dynamism allows many types of agents to work together with low friction from the user or programmer. Put together, these three components allow people new to the platform to build sophisticated groups of agents without having to do a lot of debugging or experimentation. That low friction is central to AutoGen's success and momentum.
The most popular innovation in AutoGen’s multi-agent coordination is the Group Chat Manager. It uses the LLM capabilities themselves to guide arbitrary collections of agents working together, as opposed to exposing that complexity to the user in a space that is new and not well understood. Beyond Group Chat Manager, AutoGen also supports many popular conversational patterns, such as one-to-one, hierarchical, and nested chats. Over time, as we understand the patterns that work well for different types and compositions of groups of agents, specific point functionality like the Group Chat Manager may diminish in importance, but it's been incredibly helpful for getting high value, low-friction experiences off the ground for users quickly. AutoGen also supports some interesting features, such as dynamic agents, which can—on the fly—decide to initiate and consult new agents. One of the exciting aspects of the exploding popularity of this tool is seeing the surprising and creative ways that users are leveraging these more advanced features.
AutoGen's value (today) is in low-friction assemblage of agents, tools, and human feedback, which results in this “conversation-centric computing” paradigm. The core technique is simple: basic message passing among agents, humans, and tools (code). There is really nothing special in the messaging, the popularity of AutoGen really resides in how the low friction needed to get interesting combinations of agents up and running quickly. Many of the capabilities in OpenAI's platform enhance AutoGen's capabilities as well, as AutoGen sits above the level of large language models.
Because AutoGen sees the entire agent graph, it can make optimizations in the back end. Some of these optimizations AutoGen supports include performance tuning, transparent error handling, and caching. but rather calls LLMs through pre-existing APIs that leverage inference optimizations. Additionally, since AutoGen is model independent, over time it can support a fleet of optimized per-topic "expert agents" that are called where appropriate, rather than calling expensive foundation models for every type of agent. Specifically, we are exploring advanced AI model techniques such as those contained in MSR's Orca and phi models.
We have two parallel tracks that we are pursuing. The first is to move the platform forward with requests from the open-source community and integration with new capabilities like OpenAI's Assistants/custom GPTs APIs. The second is to advance the science of automated problem solving. One direction is to leverage learning loops to identify which combination of agents best solve problems. Another direction is to advance the understanding of how to partition tasks automatically into solvable sub-tasks with AutoGen.
All of these frameworks are useful and provide a different (but often overlapping) set of capabilities. All of them are essentially running experiments to see which features will be most useful, which abstractions people most like, and which classes of problems each set of capabilities can address. And given that they are all either open source or open access, they will compose in interesting ways. Semantic Kernel supports both AutoGen and its native multi-agent approach. AutoGen just integrated with OpenAI’s Assistant/Custom GPT developer interfaces. I think having an ebb and flow, and varying levels of integration between these frameworks, is allowing the community to experiment rapidly and have all of us advance the utility of these frameworks more quickly than if they were rigid verticals with no cross pollination. 💥 Miscellaneous – a set of rapid-fire questions
I don't have a favorite area, there are a ton of research problems I'm interested in and I've historically worked across many areas. Understanding biological neural networks is one current focus. In the past, I’ve spent a lot of time working on more efficient silicon architectures, particularly dataflow architectures, and advanced numerical quantization approaches for deep learning. Another area I'm excited about is programming languages for hardware synthesis, both ASICs and reconfigurable computing.
There are several important research areas and problems where breakthroughs would take multi-agent systems to the next level. One is a more formal view of explainability ... what semantic features in LLMs do different prompts invoke, and which semantic hierarchies invoked across multiple agents are most effective at solving problems, working together, being creative, etc.? These models contain so much information, but we don't have good structures for reasoning about how even one invocation works, let alone how to think about multiple invocations collaborating. I expect this area to advance empirically for now, with learning loops improving the empirical results, but it would be wonderful to have some deeper theory to understand why different combinations of agents work well or poorly. Second, having formal abstractions to reason about correctness of a wide range of problems would be good. Things like code are testable, because they (ideally) have precise specifications. But applying a notion of "correct" or "good enough" to a wide range of problems will allow multi-agent systems to be much more effective. Finally, we need formal structures to support de-composition and re-composition of tasks into subtasks and back into tasks. Currently our approaches are ad hoc and having formal structures to solve problems hierarchically (for general problems) will be essential. These structures may also change how we architect solutions; sort of an AI version of Conway's Law. Another Conway’s Law-related observation is that the capabilities of the models will also change the topology of the ideal multi-agent solutions, an observation we refer to as Gabuchi’s Law.
It's a really great (and tough) question. I recently learned from Eric Horvitz, Microsoft’s Chief Scientific Officer, that the originator of the "technological singularity" concept was actually John von Neumann, but he used it in a different fashion than how Ray Kurtzweil and others use it today. He meant it as the point where technology was advancing so rapidly that it was not possible to extrapolate and make predictions about even the near-term future. I feel like we are at that point; I built a multi-agent application this week that would not have been possible just two weeks ago. But researchers should predict, so I'll make a prediction that will likely be wrong: In five years we will have a much deeper understanding of how human collaborative graphs and AI collaborative graphs work. We'll be able to mix and match them to design systems that can give us much better outcomes on hard problems. My personal dream is that we can use these capabilities to solve problems, like building a more fair or more sustainable society, that are beyond our reach today because they interact with large-scale human and sociotechnical systems. It's also possible that these technologies will lead us to bigger problems. Just like evolution is unpredictable, it's unclear what the driving forces that will guide how these technologies shape society will be (what is the equivalent of natural selection?) In part that is up to us, to the extent that we can understand and guide how this technology affects society. We might need sophisticated multi-agent systems to understand how to steer sophisticated multi-agent systems for responsible use. All of us have a collective responsibility to steer these powerful technologies in directions that do more good than harm. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 345: Deep Diving Into Reinforcement Learning with Human Feedback
Tuesday, November 21, 2023
Details about the most important fine-tuning technique ever created.
📝 Guest Post: Creating your first Data Labeling Agent*
Monday, November 20, 2023
In this guest post, Jimmy Whitaker, Data Scientist in Residence at Human Signal, focuses on guiding users in building an agent using the Adala framework. He dives into the integration of Large Language
Thank you for supporting TheSequence
Sunday, November 19, 2023
TheSequence Thank you for reading TheSequence. As a token of our appreciation, we're offering you a limited-time offer of 20% off a paid subscription. Redeem special offer Here are the benefits you
I Promise, this Editorial is NOT About OpenAI
Sunday, November 19, 2023
Some major milestones in generative video were announced this week.
😎 Private Preview: Build Real-Time AI Applications Using Only Python
Friday, November 17, 2023
Our friends from Tecton launched a new, AI-optimized, Python-based compute engine called Rift. Now you can build real-time AI applications in minutes! Using Tecton with Rift, you can: Build better
You Might Also Like
🕹️ Retro Consoles Worth Collecting While You Still Can — Is Last Year's Flagship Phone Worth Your Money?
Saturday, November 23, 2024
Also: Best Outdoor Smart Plugs, and More! How-To Geek Logo November 23, 2024 Did You Know After the "flair" that servers wore—buttons and other adornments—was made the butt of a joke in the
JSK Daily for Nov 23, 2024
Saturday, November 23, 2024
JSK Daily for Nov 23, 2024 View this email in your browser A community curated daily e-mail of JavaScript news React E-Commerce App for Digital Products: Part 4 (Creating the Home Page) This component
Not Ready For The Camera 📸
Saturday, November 23, 2024
What (and who) video-based social media leaves out. Here's a version for your browser. Hunting for the end of the long tail • November 23, 2024 Not Ready For The Camera Why hasn't video
Daily Coding Problem: Problem #1617 [Easy]
Saturday, November 23, 2024
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Microsoft. You are given an string representing the initial conditions of some dominoes.
Ranked | The Tallest and Shortest Countries, by Average Height 📏
Saturday, November 23, 2024
These two maps compare the world's tallest countries, and the world's shortest countries, by average height. View Online | Subscribe | Download Our App TIME IS RUNNING OUT There's just 3
⚙️ Your own Personal AI Agent, for Everything
Saturday, November 23, 2024
November 23, 2024 | Read Online Subscribe | Advertise Good Morning. Welcome to this special edition of The Deep View, brought to you in collaboration with Convergence. Imagine if you had a digital
Educational Byte: Are Privacy Coins Like Monero and Zcash Legal?
Saturday, November 23, 2024
Top Tech Content sent at Noon! How the world collects web data Read this email in your browser How are you, @newsletterest1? 🪐 What's happening in tech today, November 23, 2024? The HackerNoon
🐍 New Python tutorials on Real Python
Saturday, November 23, 2024
Hey there, There's always something going on over at Real Python as far as Python tutorials go. Here's what you may have missed this past week: Black Friday Giveaway @ Real Python This Black
Re: Hackers may have stolen everyone's SSN!
Saturday, November 23, 2024
I wanted to make sure you saw Incogni's Black Friday deal, which is exclusively available for iPhone Life readers. Use coupon code IPHONELIFE to save 58%. Here's why we recommend Incogni for
North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedIn
Saturday, November 23, 2024
THN Daily Updates Newsletter cover Generative AI For Dummies ($18.00 Value) FREE for a Limited Time Generate a personal assistant with generative AI Download Now Sponsored LATEST NEWS Nov 23, 2024