Multi-Agent Systems Fundamentals - A Personal Experience

Multi-Agent Systems (MAS) and the Agentic approach have been getting a lot of attention lately. (For example, see Andrew Ng’s articles). While these are exciting developments, the Multi-Agent approach to solving problems has been around for quite some time and has some very sound theoretical underpinnings. For example, I remember using Jacques Ferber’s book (“Multi-agent systems: An introduction to distributed artificial intelligence”) extensively as a reference back in 2000 when we were developing the first Saffron intelligent agent platform. Mind you, all that was done through large Associative Memories, and without any access to LLMs. Times have changed and LLMs have introduced at least an order of magnitude improvements into the quality we can expect from MAS.

While Andrew’s articles take the perspective that the Agentic approach is a way to improve the quality of results we get from LLMs, I would reverse that outlook. From my perspective, LLMs are an evolution of the underlying tools available to MAS to improve their quality of results. Perspectives however are not facts, and they are just different views on some underlying truth.

In this article I wanted to provide some fundamentals into the MAS approach, as a precursor to additional articles being planned by my Catio team to expand on the MAS work being done internally. At Catio we’ve been building complex MAS workflows on top of LLMs and we’d love to share some of the best practices we’ve developed, and some of the scars and gotchas. The next articles will be more practical, I promise. Stay tuned.

Introduction to Multi-Agent Systems

At its core, a Multi-Agent System consists of multiple interacting autonomous entities (the “intelligent” agents) capable of making decisions and performing tasks independently, yet they collaborate to achieve common goals or to solve complex problems more efficiently than any single agent could. In his series of articles Andrew Ng highlights the importance of agent workflows in AI, particularly for enhancing the performance of Large Language Models. Multi-agent collaboration enhances problem-solving capabilities by enabling agents to debate ideas and split tasks effectively, thereby improving overall system performance. As a complement to Andrew Ng’s “4 characteristics” (Reflexion, Tool Use, Planning, Collaboration), I would suggest that from another perspective, these are the fundamental key characteristics of Multi-Agent Systems:

Autonomy: Agents operate without direct intervention from humans or other agents, making their own decisions based on their perception of the environment.
Social Ability: Agents interact with each other through communication, which is essential for coordination and cooperation.
Reactivity: Agents perceive their environment and respond to changes in a timely manner.
Proactiveness: Agents do not merely act in response to their environment; they are capable of taking initiative to fulfill their designed objectives.

The rest of this article will explore the history and fundamentals of Mutli-Agent Systems, Distributed Problem Solving, agent architectures and their applications.

A Brief History of Multi-Agent Systems

The concept of Multi-Agent Systems has roots that go back to the early days of AI and computer science. The origins of MAS can be traced to the 1970s and 1980s when researchers began to explore the idea of Distributed Artificial Intelligence (DAI). DAI emerged from the need to solve problems that were too complex for a single agent to handle. Early research focused on distributed problem-solving and the coordination of multiple agents working together to achieve a common goal.

The foundational concept in MAS is the notion of an agent. In the context of MAS, an agent is an autonomous entity capable of perceiving its environment, making decisions, and taking actions to achieve specific objectives. This concept was influenced by work in robotics, artificial life, and cognitive science, where the idea of autonomous systems was already being explored.

Throughout the 1980s and 1990s, MAS research gained momentum as computing power increased and networking technologies advanced. Long are the days of solving large associative memory matrices on SPARCstations. Ask me how I know. This period saw the development of key theoretical frameworks and algorithms that form the backbone of modern MAS. Researchers focused on agent communication languages, protocols for negotiation and cooperation, and mechanisms for conflict resolution.

There were many milestones along the way such as the introduction of the Contract Net Protocol by Reid G. Smith in 1980, which provided a framework for task distribution and negotiation among agents. Another crucial development was the creation of the Belief-Desire-Intention (BDI) model, which formalized the internal states and decision-making processes of agents. For me, the theoretical foundation of large MAS was consolidated in Jacques Ferber’s book in 1999, a book I still use to this day..

Today, MAS is a mature field with a rich body of theoretical and practical knowledge. It continues to evolve, driven by the growing complexity of problems and the increasing demand for intelligent, distributed solutions and the introduction of Large Language Models as a foundational technology.

MAS and Distributed Problem Solving

Many real world problems are too intricate to be solved by an LLM, a single agent or a centralized system. This is where Multi-Agent Systems excel, offering a robust framework for what is known as Distributed Problem Solving (DPS). By leveraging the capabilities of multiple agents, MAS can tackle large-scale, dynamic problems more effectively than traditional approaches.

Distributed problem solving involves breaking down a complex problem into smaller, manageable sub-problems that can be solved concurrently by multiple agents. Each agent works on a part of the problem, and through communication and coordination, the agents combine their individual solutions to form a comprehensive solution to the original problem.

Andrew Ng has used the analogy of writing an essay to illustrate problem decomposition and iterative solving. While the idea of writing an essay seems like a simple task, this approach lends itself extremely well to solving complex problems. At Catio we use a similar approach to DPS to decompose the problem of understanding a set of business and product requirements, understanding a current architecture and tech stack, and then combining the two to come up with a set of recommendations to optimize a given tech stack architecture for a variety of objective functions (cost, throughput, latency, etc).

MAS facilitates DPS through three main mechanisms:

Task Decomposition: The problem is divided into smaller tasks that can be assigned to different agents based on their capabilities and resources.
Agent Coordination: Agents communicate and coordinate their actions to ensure that their individual efforts contribute to the overall solution. Coordination can be achieved through negotiation, planning, and synchronization.
Resource Allocation: MAS optimizes the use of resources by distributing tasks among agents based on their availability and efficiency.

In addition to the current cutting edge work on tech stacks and cloud architecture, DPS techniques have been used very successfully in solving large, complex problems in the following areas:

Smart Grids: In smart grid systems, MAS are used to manage the distribution of electricity. Agents representing different grid components (e.g., power plants, substations, and consumers) work together to balance supply and demand, optimize energy usage, and improve the grid's resilience to disruptions.
Supply Chain Management: MAS can optimize supply chain operations by coordinating the actions of agents representing suppliers, manufacturers, distributors, and retailers. This leads to improved inventory management, reduced lead times, and enhanced responsiveness to market changes.
Traffic Management: MAS are used in traffic management systems to optimize the flow of vehicles in urban areas. Agents representing traffic lights, vehicles, and traffic sensors communicate to adjust signal timings, reroute traffic, and reduce congestion.

The relevance of MAS in distributed problem solving is undeniable. By enabling multiple agents to collaborate and solve complex problems efficiently, MAS provide a powerful tool for tackling the challenges of modern, interconnected systems.

Agent Architecture

The architecture of agents within a MAS is pivotal to their functionality and efficiency. The design of these architectures determines how agents perceive their environment, make decisions, and interact with other agents. A well-constructed agent architecture enables the agents to operate autonomously, adapt to changes, and cooperate effectively to solve complex problems.

Agent architecture refers to the underlying structure and design principles that govern the behavior and interactions of agents. It encompasses the internal mechanisms and processes that enable agents to function independently and collaboratively. There are three types of agent architectures:

Reactive Agents: Reactive agents operate on a stimulus-response basis. They do not maintain an internal state or model of the environment. Their strength lies in simplicity and speed, making them suitable for dynamic and rapidly changing environments. Their main weakness is their limited ability to handle complex tasks that require long-term planning or learning from past experiences.
Deliberative Agents: Deliberative agents possess an internal model of the environment and use this model to plan actions. They are capable of reasoning and long-term planning. Their strength is their ability to handle complex tasks that require strategic thinking and foresight. This strength is also a weakness in terms of higher computational requirements and slower response times compared to reactive agents.
Hybrid Agents: Hybrid agents combine elements of both reactive and deliberative architectures. They use reactive mechanisms for immediate responses and deliberative mechanisms for strategic planning. This balance between responsiveness and strategic capability makes them adaptable to a wide range of environments and tasks.

The architecture of agents in MAS is critical to their performance and effectiveness. By understanding and implementing robust, scalable, and adaptable architectures, developers can create agents that excel in a variety of complex and dynamic environments.

Agent Communications

In Multi-Agent Systems, effective communication is the cornerstone of collaboration and coordination among agents. Communication allows agents to share information, negotiate tasks, and synchronize their actions to achieve common goals. The design and implementation of communication protocols and languages are crucial to the success of MAS. Communication in MAS is required for the following purposes:

Coordination: Agents need to coordinate their actions to avoid conflicts and ensure that their collective efforts are aligned toward the system's objectives.
Cooperation: Communication enables agents to work together, share resources, and collaborate on tasks that require joint effort.
Negotiation: In competitive or resource-constrained environments, agents must negotiate to reach mutually beneficial agreements and resolve conflicts.
Information Sharing: Agents share information about the environment and their internal states, which enhances their ability to make informed decisions and adapt to changes.

The design of communication protocols and languages is critical for ensuring efficient and reliable interactions among agents. To ensure effective communication among agents, several methods and best practices can be employed:

Message Passing: Agents exchange messages using a predefined protocol. Message passing can be synchronous (requiring immediate response) or asynchronous (allowing delayed responses).
Broadcasting: In some scenarios, agents broadcast messages to all other agents in the system. This is useful for disseminating information quickly but can lead to network congestion.
Direct Communication: Agents communicate directly with specific other agents, which can reduce communication overhead and increase efficiency.
Middleware: Middleware solutions, such as agent platforms and frameworks (e.g. CrewAI, LangGraph and AutoGen) provide built-in support for communication, making it easier to implement and manage interactions among agents.

As in most communication protocols however, it’s important to separate the content encoding from the means to communicate it. There has been traditionally a lot of emphasis on Agent Communication Languages (ACLs) and Content Languages. ACLs are designed to enable structured and standardized communication between agents. They specify the syntax and semantics of messages that agents can exchange. While ACLs define the structure of messages, content languages specify how the content of the message is structured. These can include languages based on logic or those based on ontologies providing a shared vocabulary for agents to understand the context of the information exchanged.

ACLs such as FIPA-ACL (Foundation for Intelligent Physical Agents - Agent Communication Language) and KQML (Knowledge Query and Manipulation Language) have been developed for that purpose. While I’ve used KQML and OWL in the past, the advent of LLMs and their understanding of context and semantics has introduced a major shift in how agents can communicate. Modern agent communication has been more geared towards a prompt type of interaction that the more structured languages. At Catio we use prompts and english language conversations for content, since the underlying LLMs do not need a formal ontology such as OWL for understanding the intent, while developing our own framework for message passing and message persistence and retrieval.

Applications of Multi-Agent Systems

Multi-Agent Systems have found applications in a wide range of industries, solving complex problems that require coordination, cooperation, and adaptability. By leveraging the capabilities of multiple autonomous agents, MAS can enhance efficiency, scalability, and robustness in various domains.

Healthcare:
- Patient Monitoring: Agents monitor patient health data in real-time, detect anomalies, and alert healthcare providers. For example, in smart hospitals, MAS can manage patient care by coordinating medical devices, patient records, and healthcare professionals.
- Resource Management: Agents manage the allocation of medical resources such as hospital beds, medical staff, and equipment to optimize patient care and reduce waiting times.
Finance:
- Automated Trading: MAS are used in stock trading to analyze market trends, execute trades, and optimize portfolios. Each agent can represent different trading strategies and work together to maximize returns.
- Fraud Detection: Agents collaborate to detect fraudulent activities by analyzing transaction patterns and identifying anomalies.
Logistics and Supply Chain Management:
- Inventory Management: Agents monitor inventory levels, predict demand, and coordinate restocking to ensure optimal inventory levels.
- Transportation and Distribution: MAS optimize routing and scheduling of deliveries, taking into account traffic conditions, delivery windows, and vehicle capacities.
Smart Grids:
- Energy Management: Agents manage energy production, distribution, and consumption in smart grids. They optimize energy use, balance supply and demand, and improve grid resilience.
- Demand Response: Agents coordinate demand response programs by adjusting energy consumption patterns based on real-time grid conditions and consumer preferences.
Traffic Management:
- Adaptive Traffic Control: Agents manage traffic lights and signals, optimize traffic flow, and reduce congestion. They can adapt to real-time traffic conditions and coordinate with other traffic management systems.
- Autonomous Vehicles: In autonomous vehicle systems, each vehicle acts as an agent. MAS facilitate communication between vehicles to enhance safety, efficiency, and coordination on the roads.
Manufacturing:
- Production Planning: Agents coordinate production schedules, manage resources, and optimize manufacturing processes to improve efficiency and reduce downtime.
- Quality Control: MAS monitor production quality, detect defects, and coordinate corrective actions to ensure high-quality products.
Software Architecture:
- Decision Insight: Agents can learn the requirements and constraints on an organization, along with the existing architecture, and mirror the organizational structure of various constituencies involved in the decision making process, including CTO, architects, tech leads, CFO and so on, in order to arrive at specific recommendations.

The applications of Multi-Agent Systems are vast and varied, demonstrating their versatility and effectiveness in solving complex problems across different industries. By leveraging the power of multiple autonomous agents, MAS can enhance efficiency, scalability, and robustness, making them an invaluable tool in today's increasingly interconnected and dynamic world.

Identifying the Sweet Spot of Multi-Agent Systems

Multi-Agent Systems (MAS) are particularly effective for certain types of applications, while in other scenarios, their use might be excessive and unnecessary. The sweet spot for MAS lies in applications that involve complex, dynamic interactions, require distributed problem-solving, and benefit from the autonomous, adaptive capabilities of multiple agents. In contrast, simpler or more centralized tasks might not justify the complexity and overhead of implementing MAS. Understanding where MAS provide the most value can help in making informed decisions about their deployment.

Ideal Use Cases for MAS:

Complex Distributed Systems:
- Example: Smart Grids
- Why: MAS can efficiently manage and optimize the distribution of resources (like electricity) across a network of autonomous agents, each representing different components (e.g., power plants, substations).
Dynamic Environments:
- Example: Traffic Management
- Why: MAS can adapt to real-time changes in traffic conditions, optimizing flow and reducing congestion through coordinated control of traffic signals and vehicle routing.
Resource Allocation and Management:
- Example: Supply Chain Management
- Why: Agents can represent different entities in the supply chain, coordinating to optimize inventory levels, manage logistics, and respond to demand changes dynamically.
Collaborative Robotics:
- Example: Warehouse Automation (e.g., Amazon’s warehouse robots)
- Why: Multiple robots can work together to pick, pack, and transport items efficiently, improving throughput and operational efficiency.
Simulation and Modeling:
- Example: Software Architecture
- Why: MAS can simulate interactions between various tech stack components (e.g., databases, microservices, ML) to evaluate and optimize tech stack architectures.

Situations Where MAS Might Be Overkill:

Simple, Centralized Problems:
- Example: Basic data processing tasks
- Why: A single-agent system or traditional algorithm can handle these tasks more efficiently without the overhead of managing multiple agents.
Small-Scale Applications:
- Example: Personal scheduling applications
- Why: The complexity of MAS would not be justified for tasks that do not require distributed problem-solving or high levels of coordination.
High-Stakes, Real-Time Decision Making:
- Example: High-frequency trading
- Why: The latency introduced by agent communication and coordination might be unacceptable compared to tightly optimized single-agent systems.
Environments with Limited Interactions:
- Example: Standalone IoT devices with minimal interaction
- Why: If devices operate mostly independently, the benefits of MAS coordination and communication are minimal.

Conclusion

Multi-Agent Systems (MAS) represent a powerful paradigm in artificial intelligence, enabling the coordination and cooperation of multiple autonomous agents to solve complex, distributed problems. From managing smart grids and optimizing supply chains to enhancing robotic systems and generating software architectures, MAS offer unique advantages in scalability, adaptability, and robustness. By understanding their historical development, architectural principles, communication protocols, and real-world applications, we can better appreciate the potential of MAS and identify where they provide the most value.

While MAS are particularly effective in dynamic and complex environments, their use may be overkill for simpler, centralized tasks. Careful evaluation of the problem's complexity and requirements is essential to determine the suitability of MAS.

As AI continues to evolve, MAS will play an increasingly significant role in various domains, offering innovative solutions to the challenges of modern, interconnected systems.

Coming soon: Supercharging AI with MAS. Stay tuned!

‍Follow this post on LinkedIn for additional comments and discussion.

------------------------------

About the Author

Toufic Boubez is Catio’s CTO & Co-Founder, previously the VP of Engineering and Global Head of AI and Incubation for Splunk, a 4x CTO / Co-Founder, and earlier in his career the Chief Architect of Service Oriented Architectures at IBM. Toufic writes extensively about the latest best practices with AI and tech stack architectures. Follow Toufic on LinkedIn, to stay up to date on his latest thinking and developments.‍

Learn more about Catio

We are building Catio, a Copilot for Tech Stack Architecture. Catio can improve your architecture productivity and engineering impact by 30-80% - Catio equips CTOs, architects, and counterparts to manage their tech stacks much more effectively, data-driven and AI-enabled. Please watch this demo video to learn how Catio works.

Please review Catio Central for our latest product brochures, demo videos, and additional information.
Want to stay updated on the latest in tech stack architecture and AI technologies? Follow Catio on LinkedIn.
Interested in learning more or implementing Catio for your company? Contact us to schedule a personalized consultation.