The Sequence Chat: Doug Burger- Technical Fellow, Microsoft Research About Building Autonomous Agents, AutoGen and…
Was this email forwarded to you? Sign up here The Sequence Chat: Doug Burger- Technical Fellow, Microsoft Research About Building Autonomous Agents, AutoGen and the Future of Generative AIOne of the members of the AutoGen team shares insights about its vision, architecture and the future of autonomous agents.Quick bio
I started out as a researcher in computer architecture, focusing on CPU and memory systems architecture. I spent a decade as an academic (Computer Sciences professor at the University of Texas at Austin) co-leading a DARPA-funded research program there. In 2008 I joined Microsoft and spent a decade working in Microsoft Research across several areas, including computer architecture, AI, computing systems, and reconfigurable computing. After that I shifted to Azure with the founding of the Azure Hardware Systems group, and served as a product executive building AI supercomputers. A bit under a year ago I moved back to Microsoft Research to take on a leadership role, and help evolve the organization to meet the challenge of the new AI era. I’ve been interested in AI for a long time, but didn’t work in it as a core area until about 2015. I co-authored my first AI-related paper in 2000 (using neural networks to manage on-CPU hardware resources). My colleagues and I did other explorations on the boundaries of AI and computer architecture over the next 15 years, including analog neural networks for low-power branch prediction (2008), using neural networks to replace conventional code running on CPUs (2012), and building analog neural network accelerators into CPUs to offload that code (2014). In 2013, my team started exploring building neural network accelerators on FPGAs, and in 2015 we ramped that effort to take it to production as Project Brainwave, which shipped into production at large scale in 2017, accelerating neural network inference for Bing and Office. In 2018, when I moved into Azure, my team started building custom AI supercomputers at scale. Some of that work involved deep algorithmic work, which culminated in this year’s announcement of the MX consortium, which is standardized 4, 6, and 8-bit datatypes for ultra-efficient AI computation. 🛠 AI Work
I’m not actually one of the co-creators. I’ve been working closely with Chi Wang, Gagan Bansal, and Ahmed Adawallah on the AutoGen team. When I met them last summer, I realized the importance of this area and what they were building, and started meeting with the team weekly to see how I could help. The potential of this area (and project) is to uplevel the capabilities of AI working with humans, allowing everyone to achieve more, which is in line with Microsoft’s mission as a company. The team built a beautiful library that made multi-agent orchestration, and using human feedback in the loop, simple and powerful. That's one reason for its great success and growing popularity on GitHub. The thing that I wanted to see was a scientific study of why different patterns worked well or poorly. What combination of agent capabilities would be most effective at solving tasks. How would you know that a particular subtask was complete and correct, without a human monitoring each agent interaction? In the open-source community, there are huge numbers of people leveraging AutoGen in creative ways, and solving surprising problems. One pattern that we see as fundamental is the "generator+critic" pattern, where one agent generates content (writing, code, etc.) and another agent critiques it (finds bugs, etc.) They can iterate until the solution is correct. Or find problems in the environment and automatically install the needed packages for the code it's generated to run. Right now, the AutoGen user has the option to be in the loop for any interaction, which is super valuable as we are figuring out how to apply this technology to successively larger problems. Ultimately, we will want more of an agent communication graph to be "closed loop", meaning that humans don't need to review the generated information, and can interact in an open loop fashion at a higher level (for example, refining specifications where they are imprecise, or giving feedback about how the agents are interacting if it can't make forward progress). Ultimately, graphs of human collaborations and graphs of agent collaborations will be interleaved, and I think we'll find that the ideal interleavings will be quite surprising. The two most important problems (at least how I am thinking about them currently), are:
AutoGen has three core components that contributed to its success early on. First, it is incredibly simple and lightweight; setting up multiple agents interacting through "conversational programming" is straightforward. That ease of use allows people to get up and running quickly. The abstractions AutoGen provides are fully general, but they can be realized in useful specific ways: the notion of a "User Proxy Agent" allows people to choose to be an agent in the graph, intercept messages, provide their own, or allow the agents to run. That capability greatly simplifies the ability to keep things on track, especially in the early days when we don't know how to ground the conversations and inter-agent collaborations fully. Third, AutoGen’s flexible topologies allow for arbitrary and creative organizations of agent graphs. An example is AutoGen’s Group Chat Manager, which allows an arbitrary set of agents to participate in a chat. The Group Chat Manager selects the agent that seems like the best choice to respond at each iteration of the conversation. This dynamism allows many types of agents to work together with low friction from the user or programmer. Put together, these three components allow people new to the platform to build sophisticated groups of agents without having to do a lot of debugging or experimentation. That low friction is central to AutoGen's success and momentum.
The most popular innovation in AutoGen’s multi-agent coordination is the Group Chat Manager. It uses the LLM capabilities themselves to guide arbitrary collections of agents working together, as opposed to exposing that complexity to the user in a space that is new and not well understood. Beyond Group Chat Manager, AutoGen also supports many popular conversational patterns, such as one-to-one, hierarchical, and nested chats. Over time, as we understand the patterns that work well for different types and compositions of groups of agents, specific point functionality like the Group Chat Manager may diminish in importance, but it's been incredibly helpful for getting high value, low-friction experiences off the ground for users quickly. AutoGen also supports some interesting features, such as dynamic agents, which can—on the fly—decide to initiate and consult new agents. One of the exciting aspects of the exploding popularity of this tool is seeing the surprising and creative ways that users are leveraging these more advanced features.
AutoGen's value (today) is in low-friction assemblage of agents, tools, and human feedback, which results in this “conversation-centric computing” paradigm. The core technique is simple: basic message passing among agents, humans, and tools (code). There is really nothing special in the messaging, the popularity of AutoGen really resides in how the low friction needed to get interesting combinations of agents up and running quickly. Many of the capabilities in OpenAI's platform enhance AutoGen's capabilities as well, as AutoGen sits above the level of large language models.
Because AutoGen sees the entire agent graph, it can make optimizations in the back end. Some of these optimizations AutoGen supports include performance tuning, transparent error handling, and caching. but rather calls LLMs through pre-existing APIs that leverage inference optimizations. Additionally, since AutoGen is model independent, over time it can support a fleet of optimized per-topic "expert agents" that are called where appropriate, rather than calling expensive foundation models for every type of agent. Specifically, we are exploring advanced AI model techniques such as those contained in MSR's Orca and phi models.
We have two parallel tracks that we are pursuing. The first is to move the platform forward with requests from the open-source community and integration with new capabilities like OpenAI's Assistants/custom GPTs APIs. The second is to advance the science of automated problem solving. One direction is to leverage learning loops to identify which combination of agents best solve problems. Another direction is to advance the understanding of how to partition tasks automatically into solvable sub-tasks with AutoGen.
All of these frameworks are useful and provide a different (but often overlapping) set of capabilities. All of them are essentially running experiments to see which features will be most useful, which abstractions people most like, and which classes of problems each set of capabilities can address. And given that they are all either open source or open access, they will compose in interesting ways. Semantic Kernel supports both AutoGen and its native multi-agent approach. AutoGen just integrated with OpenAI’s Assistant/Custom GPT developer interfaces. I think having an ebb and flow, and varying levels of integration between these frameworks, is allowing the community to experiment rapidly and have all of us advance the utility of these frameworks more quickly than if they were rigid verticals with no cross pollination. 💥 Miscellaneous – a set of rapid-fire questions
I don't have a favorite area, there are a ton of research problems I'm interested in and I've historically worked across many areas. Understanding biological neural networks is one current focus. In the past, I’ve spent a lot of time working on more efficient silicon architectures, particularly dataflow architectures, and advanced numerical quantization approaches for deep learning. Another area I'm excited about is programming languages for hardware synthesis, both ASICs and reconfigurable computing.
There are several important research areas and problems where breakthroughs would take multi-agent systems to the next level. One is a more formal view of explainability ... what semantic features in LLMs do different prompts invoke, and which semantic hierarchies invoked across multiple agents are most effective at solving problems, working together, being creative, etc.? These models contain so much information, but we don't have good structures for reasoning about how even one invocation works, let alone how to think about multiple invocations collaborating. I expect this area to advance empirically for now, with learning loops improving the empirical results, but it would be wonderful to have some deeper theory to understand why different combinations of agents work well or poorly. Second, having formal abstractions to reason about correctness of a wide range of problems would be good. Things like code are testable, because they (ideally) have precise specifications. But applying a notion of "correct" or "good enough" to a wide range of problems will allow multi-agent systems to be much more effective. Finally, we need formal structures to support de-composition and re-composition of tasks into subtasks and back into tasks. Currently our approaches are ad hoc and having formal structures to solve problems hierarchically (for general problems) will be essential. These structures may also change how we architect solutions; sort of an AI version of Conway's Law. Another Conway’s Law-related observation is that the capabilities of the models will also change the topology of the ideal multi-agent solutions, an observation we refer to as Gabuchi’s Law.
It's a really great (and tough) question. I recently learned from Eric Horvitz, Microsoft’s Chief Scientific Officer, that the originator of the "technological singularity" concept was actually John von Neumann, but he used it in a different fashion than how Ray Kurtzweil and others use it today. He meant it as the point where technology was advancing so rapidly that it was not possible to extrapolate and make predictions about even the near-term future. I feel like we are at that point; I built a multi-agent application this week that would not have been possible just two weeks ago. But researchers should predict, so I'll make a prediction that will likely be wrong: In five years we will have a much deeper understanding of how human collaborative graphs and AI collaborative graphs work. We'll be able to mix and match them to design systems that can give us much better outcomes on hard problems. My personal dream is that we can use these capabilities to solve problems, like building a more fair or more sustainable society, that are beyond our reach today because they interact with large-scale human and sociotechnical systems. It's also possible that these technologies will lead us to bigger problems. Just like evolution is unpredictable, it's unclear what the driving forces that will guide how these technologies shape society will be (what is the equivalent of natural selection?) In part that is up to us, to the extent that we can understand and guide how this technology affects society. We might need sophisticated multi-agent systems to understand how to steer sophisticated multi-agent systems for responsible use. All of us have a collective responsibility to steer these powerful technologies in directions that do more good than harm. You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Older messages
Edge 345: Deep Diving Into Reinforcement Learning with Human Feedback
Tuesday, November 21, 2023
Details about the most important fine-tuning technique ever created.
📝 Guest Post: Creating your first Data Labeling Agent*
Monday, November 20, 2023
In this guest post, Jimmy Whitaker, Data Scientist in Residence at Human Signal, focuses on guiding users in building an agent using the Adala framework. He dives into the integration of Large Language
Thank you for supporting TheSequence
Sunday, November 19, 2023
TheSequence Thank you for reading TheSequence. As a token of our appreciation, we're offering you a limited-time offer of 20% off a paid subscription. Redeem special offer Here are the benefits you
I Promise, this Editorial is NOT About OpenAI
Sunday, November 19, 2023
Some major milestones in generative video were announced this week.
😎 Private Preview: Build Real-Time AI Applications Using Only Python
Friday, November 17, 2023
Our friends from Tecton launched a new, AI-optimized, Python-based compute engine called Rift. Now you can build real-time AI applications in minutes! Using Tecton with Rift, you can: Build better
You Might Also Like
Import AI 399: 1,000 samples to make a reasoning model; DeepSeek proliferation; Apple's self-driving car simulator
Friday, February 14, 2025
What came before the golem? ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Defining Your Paranoia Level: Navigating Change Without the Overkill
Friday, February 14, 2025
We've all been there: trying to learn something new, only to find our old habits holding us back. We discussed today how our gut feelings about solving problems can sometimes be our own worst enemy
5 ways AI can help with taxes 🪄
Friday, February 14, 2025
Remotely control an iPhone; 💸 50+ early Presidents' Day deals -- ZDNET ZDNET Tech Today - US February 10, 2025 5 ways AI can help you with your taxes (and what not to use it for) 5 ways AI can help
Recurring Automations + Secret Updates
Friday, February 14, 2025
Smarter automations, better templates, and hidden updates to explore 👀 ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
The First Provable AI-Proof Game: Introducing Butterfly Wings 4
Friday, February 14, 2025
Top Tech Content sent at Noon! Boost Your Article on HackerNoon for $159.99! Read this email in your browser How are you, @newsletterest1? undefined The Market Today #01 Instagram (Meta) 714.52 -0.32%
GCP Newsletter #437
Friday, February 14, 2025
Welcome to issue #437 February 10th, 2025 News BigQuery Cloud Marketplace Official Blog Partners BigQuery datasets now available on Google Cloud Marketplace - Google Cloud Marketplace now offers
Charted | The 1%'s Share of U.S. Wealth Over Time (1989-2024) 💰
Friday, February 14, 2025
Discover how the share of US wealth held by the top 1% has evolved from 1989 to 2024 in this infographic. View Online | Subscribe | Download Our App Download our app to see thousands of new charts from
The Great Social Media Diaspora & Tapestry is here
Friday, February 14, 2025
Apple introduces new app called 'Apple Invites', The Iconfactory launches Tapestry, beyond the traditional portfolio, and more in this week's issue of Creativerly. Creativerly The Great
Daily Coding Problem: Problem #1689 [Medium]
Friday, February 14, 2025
Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Google. Given a linked list, sort it in O(n log n) time and constant space. For example,
📧 Stop Conflating CQRS and MediatR
Friday, February 14, 2025
Stop Conflating CQRS and MediatR Read on: my website / Read time: 4 minutes The .NET Weekly is brought to you by: Step right up to the Generative AI Use Cases Repository! See how MongoDB powers your