Most write-ups about MITRE ATLAS stop at the single model. They describe how an adversary poisons training data or crafts an evasion input, and they leave it there. That was a reasonable scope when an AI system was a model behind an API. It is no longer reasonable when the system is an agent with tools, or a fleet of agents passing work between each other, calling MCP servers, reading retrieved documents, and acting on real systems. The threat surface moved. The vocabulary needs to move with it.
MITRE ATLAS is the right vocabulary to move. It gives security engineers a shared, structured way to talk about adversary behavior against AI systems, the same way ATT&CK does for enterprise IT. This post explains what ATLAS is, then extends it to where the interesting risk now lives: tool-using and multi-agent systems, where the MCP supply-chain surface and the agent stack become the threat model. We will keep one rule throughout. We describe ATLAS at the tactic and category level and point you to the live knowledge base for specific technique IDs, because the value of a threat model is precision, and inventing a technique number is the opposite of precision.
What MITRE ATLAS is and how to use it
MITRE ATLAS, the Adversarial Threat Landscape for Artificial-Intelligence Systems, is a knowledge base maintained by MITRE that catalogs how adversaries attack machine learning and AI systems. It is modeled directly on MITRE ATT&CK. If you have ever read an ATT&CK matrix, ATLAS will feel familiar. The columns are tactics, the adversary’s goals, and under each tactic sit techniques, the concrete ways an adversary reaches that goal.
ATLAS organizes adversary behavior into roughly fourteen tactics. Most are inherited or adapted from ATT&CK, and a couple are specific to AI. The tactic names are the shared vocabulary, and they read like the life cycle of an attack: Reconnaissance, Resource Development, Initial Access, ML Model Access, Execution, Persistence, Privilege Escalation, Defense Evasion, Credential Access, Discovery, Collection, ML Attack Staging (also called AI Attack Staging), Exfiltration, and Impact. Two of those, ML Model Access and ML Attack Staging, are the genuinely AI-specific tactics. They are the ones that have no clean equivalent in a traditional enterprise matrix, because they describe gaining access to a model itself and staging an attack against the model’s behavior.
A note on identifiers, because this is where most ATLAS write-ups go wrong. Tactics carry IDs in the form AML.TA followed by four digits. Techniques carry IDs in the form AML.T followed by four digits. The two are easy to confuse and easy to misquote. In this post we deliberately stay at the tactic and category level. We will name a well-known technique in plain language where it helps, for example adversarial data crafting or training data poisoning, but we will not attach a specific AML.T number to it. For the live, authoritative technique list, including exact IDs and case studies, go to atlas.mitre.org. That is the source of truth, and it changes over time.
The way to use ATLAS is not as a checklist. It is a structured prompt for a threat-modeling conversation. You walk each tactic and ask a simple question: for this system, what would an adversary do to achieve this goal, and what in our architecture makes that easier or harder. The output is a set of plausible attacker paths expressed in shared terms, which is exactly what you want to hand to a red team or a test plan.
Why multi-agent systems expand the threat surface
A single model has a narrow attack surface. You can poison its training data, you can probe it for an evasion input, you can try to extract it. Those are real risks, and ATLAS covers them well. The trouble is that almost nobody ships a bare model anymore. They ship a system around it, and every component of that system is a new place for an adversary to stand.
Tool use is the first expansion. The moment an LLM can call a tool, the model’s output stops being text and starts being action. A prompt that would have been a harmless sentence becomes a function call against a real API. The adversary’s goal shifts from making the model say something to making the model do something, and the relevant ATLAS tactics shift with it, toward Execution, Collection, and Exfiltration, because the model now has a path to all three through its tools.
Multi-agent systems expand it again, and not linearly. When agents pass work to each other, the output of one agent is the input of the next, and trust flows along those edges whether or not anyone designed it to. An adversary who compromises or manipulates one agent does not need to compromise the others. They only need that one agent’s output to be trusted downstream. This is lateral movement, an idea ATT&CK practitioners know cold, reappearing inside the agent fabric. The Discovery and Collection tactics take on new meaning when an agent can enumerate the tools and context of its peers.
The retrieval layer is part of this surface too. A retrieval-augmented agent treats fetched documents as context, and context influences behavior. If an adversary can get a document into the index, they can stage an attack that fires later when the document is retrieved, which is squarely in the spirit of the ML Attack Staging tactic. OWASP names this risk family directly as LLM08:2025 Vector and Embedding Weaknesses, and the prompt-injection vector that rides in on retrieved or tool-provided text as LLM01:2025 Prompt Injection. ATLAS gives you the adversary-goal framing; OWASP gives you the application-risk framing. They are complementary lenses, and a serious threat model uses both.
Threat-modeling a tool-using agent with ATLAS
Take a concrete system: a single agent with a model, a system prompt, three or four tools, and a retrieval index. Walking ATLAS tactics over this system produces a threat model that is specific enough to test. Here is the shape of that walk, kept at the tactic level so it maps cleanly to atlas.mitre.org.
Reconnaissance and Resource Development. What can an adversary learn about the agent before touching it, and what do they need to build first. For an agent, reconnaissance often means probing for the system prompt, the tool list, and the model family through crafted inputs. Resource development might mean preparing a poisoned document or a malicious tool definition to use later.
Initial Access and ML Model Access. How does adversary input reach the model. For a tool-using agent the answer is rarely a clean API call. It is more often indirect: a value in a retrieved document, a field returned by a tool, a message from an upstream agent. Indirect prompt injection lives here, and it is the dominant initial-access path for agentic systems because the agent reads untrusted content as if it were instruction.
Execution, Persistence, and Privilege Escalation. Once adversary content is in the context, what can it cause. Execution is the agent calling a tool the way the adversary wants. Persistence is harder and more interesting for agents: can the injected instruction survive into memory, into a stored conversation, into the retrieval index, so it fires again later. Privilege escalation, in agent terms, is OWASP’s LLM06:2025 Excessive Agency, an agent with more capability or autonomy than the task needs, so a single successful injection reaches further than it should.
Defense Evasion, Discovery, Collection, and Exfiltration. How does the adversary avoid your guardrails, learn what is reachable, gather what they want, and get it out. For agents, exfiltration is often a tool call: the data leaves through a legitimate-looking API request whose parameters the injected instruction shaped. This is the chain that turns a clever prompt into a real loss, and it is why the testing question is never just can the model be tricked but can a trick reach a tool that moves data.
The point of the walk is not to fill in every cell. It is to find the two or three attacker paths that your specific architecture makes plausible, write them down in ATLAS terms, and hand them to a test. A threat model that names paths is testable. A threat model that lists every technique in the matrix is a glossary.
The MCP layer as an ATLAS-relevant surface
There is one surface in a tool-using agent that ATLAS write-ups almost never reach, and it is the one that has changed fastest: the tool supply chain. When an agent gets its tools from an MCP server, the declared tool surface is itself an attack surface, and it maps onto ATLAS tactics in a way worth making explicit.
Recall how MCP works. A server declares a set of tools, resources, and prompts. A human reviews that surface, approves it, and ships. The risk is everything that can change after approval. An upstream operator can rewrite a tool description, add a high-privilege tool, or quietly widen a tool’s parameters, and the agent reads the live surface, so the change takes effect with no code change and no pull request on your side. We covered this threat class in depth in MCP supply-chain security: what mcp-warden catches. Here we frame it through ATLAS.
A malicious MCP operator preparing a poisoned tool definition for later use is operating in the spirit of the Resource Development tactic. Delivering that surface to a connected agent, where it becomes trusted instruction, belongs to Initial Access. A surface change that survives across sessions, so the agent keeps reading the altered tools, has the character of Persistence. We are deliberately describing these at the tactic level rather than assigning a specific AML.T number, because the honest mapping is at the category level and the live IDs belong to atlas.mitre.org. The value of the framing is that it puts the MCP rug pull where it belongs in a threat model, as adversary behavior with a goal, not as a vague worry about third-party tools.
This is also where the testing pillar and the supply-chain pillar meet. Your ATLAS threat model says an adversary could alter the tool surface after approval. The control that answers it is a supply-chain integrity gate: pin the declared surface a human approved, and fail the build when it drifts. That is exactly what mcp-warden does. It is a public, MIT-licensed tool authored by DSE’s principal that pins an MCP server’s declared tool, resource, and prompt surface into a reproducible, signed lock file, then fails CI when the live surface drifts from the approved baseline. Install it with pip install mcp-warden-cli, and note that a PyPI package named mcp-warden without the -cli suffix is an unrelated impostor. Keep its scope honest: it verifies the declared surface, not runtime behavior. It is a lockfile and drift gate, not a behavioral firewall, and it complements static tool-poisoning scanners and runtime MCP gateways rather than replacing either. OWASP files this whole concern under LLM03:2025 Supply Chain. ATLAS gives you the adversary framing; the gate gives you the control.
From threat model to a real test
A threat model is a hypothesis. It says here are the paths an adversary would plausibly take through this system. The only way to know whether those paths are open is to try them, which is the difference between a document and a finding. This is where ATLAS earns its place: it makes the test scopeable, because every path is already named in shared terms.
The translation is direct. An Initial Access hypothesis about indirect prompt injection becomes a test that plants adversarial instructions in a retrieved document and a tool response and checks whether the agent follows them. An Exfiltration hypothesis becomes a test that tries to route data out through a legitimate tool call. A multi-agent lateral-movement hypothesis becomes a test that compromises one agent’s output and watches whether a downstream agent trusts it. A supply-chain hypothesis becomes a check that the approved MCP surface still matches the baseline. Each test maps to a tactic, so the result reads as evidence against a specific adversary goal rather than a pile of disconnected observations.
This is also where the limits of automation show. A checklist scanner can confirm that a guardrail string is present. It cannot reason about whether an injected instruction in a retrieved document will reach a tool that moves data three steps later, because that path crosses components and requires adversarial creativity to find. We draw that line in detail in AI red-teaming vs a checklist scan. ATLAS does not make the test automatic. It makes the test organized, so a senior tester spends their time finding real paths instead of arguing about what to call them.
What ATLAS gives you, and what it does not.
What MITRE ATLAS gives you: a shared, structured taxonomy and knowledge base for adversary behavior against AI systems, modeled on ATT&CK, organized by tactic and technique. It is the common vocabulary that lets a threat model, a red team, and a report describe the same attack the same way. It is genuinely useful for scoping tests and communicating risk across teams.
What MITRE ATLAS does not give you: it is not a control catalog. It tells you what an adversary does, not which control to deploy or how to configure it. It is not a test plan. Walking the matrix produces hypotheses, not test cases; you still have to design and run the tests. It is not a maturity score or a certification. There is no ATLAS pass or fail, and no number falls out of using it. And threat modeling itself reduces risk, it does not eliminate it: a model is a snapshot of plausible paths at a point in time, against the architecture you understood when you drew it, and both the system and the adversary keep moving. Pair ATLAS with OWASP’s LLM Top 10 for the application-risk lens, and pair the model with real testing to find out which paths are actually open.
FAQ
What is MITRE ATLAS? MITRE ATLAS, the Adversarial Threat Landscape for Artificial-Intelligence Systems, is a knowledge base maintained by MITRE that catalogs how adversaries attack AI and machine learning systems. It is modeled on MITRE ATT&CK and organizes adversary behavior into roughly fourteen tactics, the adversary’s goals, with techniques under each. Most tactics are adapted from ATT&CK; ML Model Access and ML Attack Staging are the AI-specific ones. The live knowledge base, with exact technique IDs and case studies, is at atlas.mitre.org.
How does MITRE ATLAS apply to multi-agent and tool-using systems? ATLAS describes adversary goals that map cleanly onto the agent stack. Tool use makes the model’s output an action, pulling tactics like Execution, Collection, and Exfiltration into scope. Multi-agent systems add lateral movement, because one agent’s output is trusted as another’s input. Retrieval and tool surfaces become staging and access paths. Walking the ATLAS tactics over an agent architecture produces named, testable attacker paths rather than a vague list of AI risks.
Does ATLAS replace the OWASP LLM Top 10? No. They are complementary lenses on the same systems. ATLAS frames risk as adversary behavior, by tactic and technique, which is strong for threat modeling and red-team scoping. The OWASP LLM Top 10 2025 frames risk as application weaknesses, for example LLM01:2025 Prompt Injection, LLM03:2025 Supply Chain, LLM06:2025 Excessive Agency, and LLM08:2025 Vector and Embedding Weaknesses, which is strong for developer-facing remediation. A serious threat model uses both.
Where does the MCP supply chain fit in an ATLAS threat model? The MCP tool surface is an attack surface. A malicious operator can change a tool surface after a human approved it, with no code change on your side. In ATLAS terms that behavior maps, at the category level, onto tactics like Resource Development, Initial Access, and Persistence. The control that answers it is a supply-chain integrity gate that pins the approved surface and fails CI on drift, which is what mcp-warden does. We keep this mapping at the tactic level on purpose and point to atlas.mitre.org for specific technique IDs.
Does threat modeling with ATLAS make a system secure? No, and any tool or method that claims it does is overstating. Threat modeling reduces risk; it does not eliminate it. An ATLAS-based model is a structured snapshot of plausible attacker paths against the architecture you understood at the time. It is the input to testing, not a substitute for it, and it carries no certification, score, or guarantee. Its value is making the test organized and the risk conversation precise.
Threat modeling is where the AI Security X-Ray starts. We walk MITRE ATLAS over your actual agent stack, name the attacker paths your architecture makes plausible, including the MCP supply-chain surface, and then test the ones that matter rather than handing you a glossary. The X-Ray runs two weeks, fixed fee, with first findings in 48 hours, mapped to OWASP LLM Top 10 and MITRE ATLAS. Learn more at /ai-security-assessment.html.
Key facts
- MITRE ATLAS organizes AI adversary behavior into roughly fourteen tactics modeled on MITRE ATT&CK, of which ML Model Access and ML Attack Staging are the genuinely AI-specific tactics with no clean enterprise equivalent (DSE, 2026).
- Threat modeling with ATLAS reduces risk but does not eliminate it: the result is a structured snapshot of plausible attacker paths at a point in time, an input to testing rather than a certification, score, or guarantee (DSE, 2026).