Multi-Agent Systems Security & Coordination Risks: What Breaks When AI Agents Talk to Each Other

Well, I have spent a fair amount of time as an agent system hacker, and I can tell that they are not as impregnable as the hype they develop can make them seem. Once you put two or more AI agents in the room, they might be LLM-based assistants or robot swarms or even distributed learning systems; but things are going to get messy very quickly.

A single agent does a wrong judgment and then another agent magnifies it and before you know it all your system is basing its decisions on tainted information.

There is this awkward crossing of distributed computing, machine learning, and control theory that can be called multi-agent systems security and coordination risks. Classical systems such as sensor networks and robot fleets have already been solidly researched, though the more recent introductions, particularly in the area of LLM agents, present entertainment.

This post takes a tour of three key failure points that I have observed in my experience of testing these systems: agent communication (and exploitation), failure modes at small scales turning into system-scale disasters, and consensus mechanism breakdown in response to attack. This is important in case you are building something that involves several agents.

Table of Contents

Why Multi-Agent Systems Are Different From Single-Agent Setups

Individual AI systems are fairly easy. There is a task given to you, model works on it, responds. However, the problem of multi-agent systems brings in an entire spectrum of complexity since there are now agents that are required to coordinate, exchange information and arrive at decisions through coordinated efforts.

The essential concept: two or more independent agents are engaged in a common space, where, typically, a network is involved. They would be either learning, adapting or simply running some predefined jobs according to a program. The issues lie in circumstances when these interactions entangle distrusted groups, untrustworthy networks, or antagonistic parties.

Here, security risk refers to the fact that somebody (or something) is trying to compromise the system actively – spoofed data, compromised nodes, denial-of-service attacks, collusion of rogue agents. The coordination risks are disparate: they are the malfunctions of collaboration when agents are unable to meet, get into a vicious circle, or cause a chain reaction and even without intent to do so.

In case of conventional systems such as robot swarms or sensor networks, such risks have been mapped over a decade by the researchers. The issue is put into perspective with consensus protocols, network attacks, and Byzantine fault tolerance.

But multi-agent systems based on LLM? That’s a different beast. What you are having to deal with is unstructured communication, non-transparent computing processes, and non-programmed-emergent behaviors. As I learned, even simple three agent systems can exhibit weird failure modes that none of these conventional systems would exhibit.

Inter-Agent Communication Vulnerabilities: The Weakest Link

How Agents Actually Talk to Each Other

In more traditional multi-agent systems communication typically occurs via known protocols. The sensor data are shared securely amid robots. Distributed control systems exchange state between each other by structured messages. Authentication, encryption, role-based access control everything the typical security tooling would We have.

However, communication appeared entirely different when I tried multi-agent systems based on the LLM. Natural language is exchanged one way and the other by the agents. It does not have any strict schemas, no typing checking, no formal verification. It is literally unstructured text that is fed through a graph of AI models.

This opens us up to the risks that are nonexistent in traditional systems:

Timely cross-agent injection. A hostile agent is in a position to tailor messages such that it exists to influence the interpretation of instructions by other agents. I observed this myself by testing a 3-agent system: one of the router agents in the system was a rogue one that had unique instructions in its responses and the other agents were taken aback by these as part of their commands.

Spoofing and impersonation of the message. Lack of authenticity between agents means that it is easy to send false messages which will be seen as originating with the trustworthy agents. This is a major problem of trust in open multi-agent ecosystems where agents dynamically find and engage with other agents.

The attacks on agent dialogue that are man in the middle. When messages are not encrypted and communication channels may be used by third parties that are not trusted, an attacker may be able to intercept the conversation between agents and alter it. This is particularly a problem in distributed LLM agents systems where messages may have a set of hops.

The Schema Problem and Why It Matters

Some of these problems are addressed in the traditional distributed systems by using strict message schemas. Each message has a predetermined format, required fields and type restrictions. A message that does not conform to the schema is rejected.

LLM agents do not have such a luxury. Their strength, which is a natural language comprehension and production, is their weakness, as well. In agents that use free-form text, it does not have an in-built system to validate the structure of messages or to identify malicious codes.

The study on Agentic AI Security provides that the way to go is hybrid protocols, where important coordination processes are represented using message formats, and reasoning on higher levels is performed using natural language. Yet still that is at a very early stage.

Data Injection and Sensor Spoofing in Physical Systems

In the case of the cyber-physical multi-agent systems, or drone swarms, autonomous vehicle fleets, smart grids, communication vulnerabilities physically play out.

There is a possibility of the attackers introducing counterfeited sensor information into communication channels. When a group of drones is based on common positioning information, and one of the agents would report faulty coordinates, the whole formation may be destabilized. I have worked with simulated environments where collision chains between many agents were induced by simple attacks of spoofing.

The defense strategies, in this case, are further developed: sensor data cryptographic authentication, filters of anomalies on the control layer, and redundant sensing to provide information cross-checking. But these only go, provided that you contrivance plan them out in advance.

Cascading Failures: When One Agent’s Problem Becomes Everyone’s Problem

How Failures Propagate in Connected Systems

The frightening aspect to multi-agent systems is not the breaks of individual agents, but the proliferation of the failures. Bad data is taken by one of the compromised agents. That data is taken into account by the neighboring agents. Such choices impact more downstream agents. One day you are just operating on corrupted assumptions on the whole system.

This contrasts single-agent system failures. In set-ups with many agents, interconnectedness is both an advantage and a weakness. The communication channels that bring about coordination also open ways that facilitate the spread of failures.

Studies of multi-agent cyber-physical systems classify cascading failures into a number of patterns:

Local faults via a diffusion of messages. Any single node that is compromised in a sensor network may poison consensus algorithms in the spectrum of network topology.

Aggrandation in the process of learning. In multi-agent reinforcement learning scenarios, when one agent develops a suboptimal or harmful policy, other agents who observe and learn it may adopt those particular issues multiplying the issues back on themselves; thus forming a self-reinforcing cycle of incorrect behavior.

LLM agent chain context degradation. In the case of communicating over many turns, the message at each turn is a summary or a paraphrase of the communication in the previous turn. Minor discrepancies are piled up. I observed that the mutual understanding of the work had been lost after ten-15 exchanges between the three agents, they had changed greatly as compared to the initial intent.

The Confused Deputy Problem in Agent Delegation

The confused deputy problem is an archetypal security problem, yet it manifests itself slightly differently in multi-agent systems.

And this is the simple case, A is an agent with elevated entitlements and is able to do sensitive actions. The Agent B is a limited one but with the ability to request to Agent A. On the one hand, when Agent A mindlessly performs the commands of Agent B without weeding out this person, it is possible that the latter will deceive the former into a certain action and make it do something it is not supposed to.

This is more so in the case of the LLM based systems where delegation occurs using natural language instructions. It is possible that Agent B will develop a compelling message that will persuade the Agent A and make him/her misunderstand his/her authority boundaries. The misunderstood subordinate literally gets lost in vague terms.

I experimented using a simple system with the planner agent having limited access to tools and the executor agent having total access to the system. Through well crafted requests, the planner might obtain the executor to execute requests that infringed the desired security policy, without resulting in any apparent immediate injection or jailbreak.

The mitigation in this case is being explicit on what roles to do, have access control based on capabilities, and to trace the delegation decisions, in case, being able to audit what went wrong later.

Infrastructure Cascade Failures

Cascading failures used in physical multi-agent systems – smart grids, traffic networks, industrial control – may have real-life implications.

A smart grid in which the energy management agents are distributed could have a localized fault. When the agents decide to overcorrect or when the safety mechanisms do not prevent the problem to be isolated the disturbance spreads and may lead to large-scale outages.

Cooperative vehicle agent systems have equivalent risks on traffic management systems. The failure of the sensors of one vehicle, unless it is identified and separated correctly, can trigger a chain reaction: abrupt braking, rerouting collisions, or the failure to synchronize traffic lights.

The work in this direction is strongly inspired by the control theory: how to develop fault-tolerant controllers, how to detect and isolate faults more quickly, how to component fault tolerance by adding redundancy to system-wide coordination.

Consensus Attacks and Byzantine Failures: Breaking the Agreement

What Consensus Means in Multi-Agent Systems

The consensus is the manner in which distributed agents arrive into agreement on a common state or decisions. Agents in a robot swarm may be required to coordinate to a structure or target position. Transaction order must be agreed upon in nodes in a distributed database. In LLM agent systems, there is a possibility that several agents will have to agree on a plan or a decision.

The fundamental components of coordination are consensus protocols. Once they fracture the whole system can disintegrate into incoherent states, stumbling blocks or waffle over a decision.

Byzantine Agents and Arbitrary Failures

The classical framing of the Byzantine generals problem is as follows: How do you solve consensus, when half the players may be bad or faulty (that is, mis-sending random (possibly conflicting) messages to other agents)?

Byzantine agents are non-adherent nodes in multi-agent research in security. They might:
Disparate sweetness values are sent to various neighbors.

Decline to play, hold-up the system.

Form alliance with other Byzantine players, to increase their powers.

This is addressed by traditional resilient consensus algorithms which impose some network topology requirements of sufficient amount of honest agents, and adequate connectivity, and through voting or filtering mechanisms to identify and discard outliers.

My experiment with these protocols using a simulation had revealed that even an excellently designed algorithm that is engineered may collapse should the attack pattern differ with the assumptions. To illustrate, algorithms that deal with random Byzantine failures have problems dealing with coordinated collusion of attackers sharing information.

Denial of Service and Communication Jamming

It is not always the attack that involves the transmission of fake information; at times, it is the non-transmission of information.

Inter-agent communication links Denial-of-service attacks on communication links may prevent the formation of consensus. Agents are not able to coordinate when they are not able to exchange messages. Even the increased communication hiccups can result in life-threatening desynchronization in time-constrained systems such as autonomous vehicle platoons.

Jamming attacks on wireless multi-agent systems in wireless drone swarms (mobile sensor networks) use the physical layer: the attacker sends noise on the radio to cause radio communications between agents to become ineffective.

Some of the defenses are topology-based protocols to consensus with toleration to missing messages, duplicate built communication paths, and adaptive algorithms controlled by system network environment.

The LLM Agent Consensus Problem

The multi-agent systems under the platform of LLM add another new flavor of consensus failure.

Consensus is fuzzy because agents communicate in natural language and they do not possess well-defined internal state. The agents may believe that they have negotiated a plan but both agents have categorized the plan as slightly different depending on the context window and prompting of the plan.

Such testing was of a multi-agent task planning system: three agents collectively agreed on a schedule, but as they began to be executed, each agent had a different interpretation of task dependencies and timing. There was no single point of truth in the system–three mental models that were rather inconsistent with each other.

Recent studies of governed LLM multi-agent systems address this with by adding monitoring layers to notice when their understanding has gone astray, and by compelling agents to make commitments to structured formats (JSON schemas, formal task graphs) as opposed to ad-hoc natural language negotiated agreement.

What’s Being Done to Fix These Problems

The positive sides: scientists are not merely recording the failures, but erecting fortifications.

In the case of the communication vulnerability: Hybrid protocols should be used which are structured schema of critical messages and natural language of the higher-level reasoning. Authenticated channels as well as cryptographic commitments in even agent systems in LLM. Immediate engineering solutions to render the injection attacks detectable.

In the case of cascading failures: Mechanisms that reduce the range of propagation of errors. Detection of anomalies at agent boundaries in order to detect corrupted data before spreading. Agent circuit breakers: Special system circuits stop when failure properties are identified in the course of the execution of a given working unit (LLM).

In the case of consensus attacks: Consensus resistance against adaptive deterministic Byzantine failure. Control that responds to observed attacks. Auditor and monitor layers to provide all the agent decisions that can be analyzed after the incident has happened.

The state of the art here is to use formal techniques such as provable safety guarantees, invariant sets, barrier certificates with the learning-based agents. There are now safe multi-agent reinforcement learning algorithms, which can ensure during training, as well as avoiding collisions and satisfying constraints, rather than post-training.

In the case of LLCs in particular, the trend is towards security-by-design where environments and protocols are constructed towards insecure behavior being hard or just visible. This involves access control based on roles to the tools, detailed logging history of all agent activities and special auditor agents whose sole responsibility is to monitor policy breaches.

Why This Matters Now

Multi-agent artificial intelligence systems are no longer a figment of imagination. Businesses are implementing LLCs agent-based system chatbots and code creation, and workflow solution. The autonomous vehicle fleets are no longer under simulation but on the actual road. Distributed control Smart infrastructure is being constructed in cities across the globe.

There exist security and coordination risks, and they are not necessarily even noticeable up to the point when something goes wrong. An agent has been compromised in a customer service system and might end up leaking data, or manipulating users. Accidents because of a failure in consensus in the autonomous vehicle platoon are possible. A smart grid malfunction would cause failures.

The academic community has identified much of these risks and constructed the first lines of defense but there nevertheless exists a gap between the scholarly studies and functional systems. The majority of multi-agent systems deployed that I have reviewed do not take even simple precautions such as Byzantine-resilient consensus or structured agent communication protocols.

The closing of the gap is coming to pass, albeit gradually. More individuals are required to grasp the AI/ML aspect and also the distributed systems security aspect: those individuals that can make up the divide between papers that are safe to publish in multi-agent reinforcement learning and the reality of deployment.

When you are constructing a thing with several agents of AI, you should not expect cooperation to be self-systematic. Test failure scenarios. Build in monitoring. Architecture adversarial agents. The systems are strong, but they are delicate in such a way that the single agent setups cannot be.

Pranay Aduvala

I’m a technology writer with a passion for AI and digital marketing. I create engaging and useful content that bridges the gap between complex technology concepts and digital technologies. My writing makes the process easy and curious. and encourage participation I continue to research innovation and technology. Let’s connect and talk technology!