Since the launch of ChatGPT, the initial wave of generative AI applications has largely revolved around chatbots that utilize the Retrieval Augmented Generation (RAG) pattern to respond to user prompts. While there is ongoing work to enhance the robustness of these RAG-based systems, the research community is now exploring the next generation of AI applications — a common theme being the development of autonomous AI agents.
Agentic systems incorporate advanced capabilities like planning, iteration, and reflection, which leverage the model’s inherent reasoning abilities to accomplish tasks end-to-end. Paired with the ability to use tools, plugins, and function calls — agents are empowered to tackle a wider range of general-purpose work.
Reasoning is a foundational building block of the human mind. Without reasoning one would not be able to make decisions, solve problems, or refine plans when new information is learned — essentially misunderstanding the world around us. If agents don’t have strong reasoning skills then they might misunderstand their task, generate nonsensical answers, or fail to consider multi-step implications.
We find that most agent implementations contain a planning phase which invokes one of the following techniques to create a plan: task decomposition, multi-plan selection, external module-aided planning, reflection and refinement and memory-augmented planning [1].
Another benefit of utilizing an agent implementation over just a base language model is the agent’s ability to solve complex problems by calling tools. Tools can enable an agent to execute actions such as interacting with APIs, writing to third party applications, and more. Reasoning and tool calling are closely intertwined and effective tool calling has a dependency on adequate reasoning. Put simply, you can’t expect an agent with poor reasoning abilities to understand when is the appropriate time to call its tools.
Our findings emphasize that both single-agent and multi-agent architectures can be used to solve challenging tasks by employing reasoning and tool calling steps.
For single agent implementations, we find that successful goal execution is contingent upon proper planning and self-correction [1, 2, 3, 4]. Without the ability to self-evaluate and create effective plans, single agents may get stuck in an endless execution loop and never accomplish a given task or return a result that does not meet user expectations [2]. We find that single agent architectures are especially useful when the task requires straightforward function calling and does not need feedback from another agent.
However, we note that single agent patterns often struggle to complete a long sequence of sub tasks or tool calls [5, 6]. Multi-agent patterns can address the issues of parallel tasks and robustness since multiple agents within the architecture can work on individual subproblems. Many multi-agent patterns start by taking a complex problem and breaking it down into several smaller tasks. Then, each agent works independently on solving each task using their own independent set of tools.
Architectures involving multiple agents present an opportunity for intelligent labor division based on capabilities as well as valuable feedback from diverse agent personas. Numerous multi-agent architectures operate in stages where teams of agents are dynamically formed and reorganized for each planning, execution, and evaluation phase [7, 8, 9]. This reorganization yields superior outcomes because specialized agents are utilized for specific tasks and removed when no longer required. By matching agent roles and skills to the task at hand, agent teams can achieve greater accuracy and reduce the time needed to accomplish the goal. Crucial features of effective multi-agent architectures include clear leadership within agent teams, dynamic team construction, and efficient information sharing among team members to prevent important information from getting lost amidst superfluous communication.
Our research highlights notable single agent methods such as ReAct, RAISE, Reflexion, AutoGPT + P, LATS, and multi agent implementations such as DyLAN, AgentVerse, and MetaGPT, which are explained more in depth in the full text.
Single Agent Patterns:
Single agent patterns are generally best suited for tasks with a narrowly defined list of tools and where processes are well-defined. They don’t face poor feedback from other agents or distracting and unrelated chatter from other team members. However, single agents may get stuck in an execution loop and fail to make progress towards their goal if their reasoning and refinement capabilities aren’t robust.
Multi Agent Patterns:
Multi agent patterns are well-suited for tasks where feedback from multiple personas is beneficial in accomplishing the task. They are useful when parallelization across distinct tasks or workflows is required, allowing individual agents to proceed with their next steps without being hindered by the state of tasks handled by others.
Feedback and Human in the Loop
Language models tend to commit to an answer earlier in their response, which can cause a ‘snowball effect’ of increasing diversion from their goal state [10]. By implementing feedback, agents are much more likely to correct their course and reach their goal. Human oversight improves the immediate outcome by aligning the agent’s responses more closely with human expectations, yielding more reliable and trustworthy results [11, 8]. Agents can be susceptible to feedback from other agents, even if the feedback is not sound. This can lead the agent team to generate a faulty plan which diverts them from their objective [12].
Information Sharing and Communication
Multi-agent patterns have a greater tendency to get caught up in niceties and ask one another things like “how are you”, while single agent patterns tend to stay focused on the task at hand since there is no team dynamic to manage. This can be mitigated by robust prompting. In vertical architectures, agents can fail to send critical information to their supporting agents not realizing the other agents aren’t privy to necessary information to complete their task. This failure can lead to confusion in the team or hallucination in the results. One approach to address this issue is to explicitly include information about access rights in the system prompt so that the agents have contextually appropriate interactions.
Impact of Role Definition and Dynamic Teams
Clear role definition is critical for both single and multi-agent architectures. Role definition ensures that the agents understands their assigned role, stay focused on the provided task, execute the proper tools, and minimizes hallucination of other capabilities. Establishing a clear group leader improves the overall performance of multi-agent teams by streamlining task assignment. Dynamic teams where agents are brought in and out of the system based on need have also been shown to be effective. This ensures that all agents participating in the tasks are strong contributors.
Summary of Key Insights
The key insights discussed suggest that the best agent architecture varies based on use case. Regardless of the architecture selected, the best performing agent systems tend to incorporate at least one of the following approaches: well defined system prompts, clear leadership and task division, dedicated reasoning / planning- execution — evaluation phases, dynamic team structures, human or agentic feedback, and intelligent message filtering. Architectures that leverage these techniques are more effective across a variety of benchmarks and problem types.
Our meta-analysis aims to provide a holistic understanding of the current AI agent landscape and offer insight for those building with existing agent architectures or developing custom agent architectures. There are notable limitations and areas for future improvement in the design and development of autonomous AI agents such as a lack of comprehensive agent benchmarks, real world applicability, and the mitigation of harmful language model biases. These areas will need to be addressed in the near-term to enable reliable agents.