An automated scanner runs against your LLM application, finishes in an afternoon, and hands back a clean report. Green checkmarks down the page. The board sees the report and concludes the system is safe. That conclusion is the most expensive part of the whole exercise, because the scan tested whether your application matches a list of known patterns, and an attacker is not working from your list. AI red-teaming is the opposite posture. It is adversarial, senior-led testing that treats your application the way a motivated attacker would, probing for the failure modes a static scan never reaches.
This post draws the line between the two. A checklist scan and a real test answer different questions, and confusing them is how teams ship a system they believe is hardened when it is only superficially checked. We will be specific about what each one covers, where scans go quiet, and how DSE scopes an honest, point-in-time test mapped to the OWASP Top 10 for LLM Applications and MITRE ATLAS. We will also be honest about the limits of testing itself, because a test that oversells its certainty is its own kind of false report.
What a checklist scan actually checks
A checklist scan is pattern matching against a list of known issues. It takes your application or its configuration, compares what it sees to a catalog of signatures and rules, and flags the matches. That is genuinely useful work. It catches misconfigurations, missing output filters, obvious injection sinks, exposed system prompts, and dependency versions with known advisories. If a scanner runs in your pipeline and you act on its findings, you are better off than a team that runs nothing.
The trouble starts when the scan’s output gets read as a verdict on safety rather than a verdict on the patterns it was built to find. A scanner answers one question well: does this system match any of the issues I already know about? It does not answer the question that matters for an AI system under real adversarial pressure, which is whether someone clever can make the system do something it was never supposed to do.
Two properties of scanners create the gap. First, they are signature-bound. A scanner finds what its rules describe, and the interesting attacks on LLM applications are emergent and compositional, assembled from steps that each look benign in isolation. Second, they are stateless about your business logic. A scanner does not know that your support agent has authority to issue refunds, or that your retrieval index pulls from a wiki anyone in the company can edit. The dangerous behavior in an AI application lives in exactly those context-specific seams, and a generic rule cannot see them.
Where checklist scans go quiet (the real attack surface)
The real attack surface of an LLM application is not a list of files. It is the space of inputs an attacker can craft and the actions the system can be coaxed into taking. That space is adversarial and open-ended, and it is where scanners go quiet.
Start with the prompt boundary. LLM01:2025 Prompt Injection sits at the top of the OWASP Top 10 for LLM Applications for a reason. An attacker does not need a software vulnerability in the traditional sense. They need to get text in front of the model that the model treats as instruction, whether that text arrives in a user message, a retrieved document, a tool result, or a file the agent was asked to summarize. A scanner can flag the absence of an output filter. It cannot enumerate the infinite ways a sentence can be phrased to slip past one.
Then there is what the system is allowed to do once it is confused. LLM06:2025 Excessive Agency is the risk that an agent has more capability, autonomy, or permission than the task requires, so a successful injection turns into a real action rather than a stray sentence. A scanner can list the tools an agent has. It does not reason about what an attacker can chain those tools into, or which combination of a benign read and a benign write becomes data exfiltration when sequenced by a clever prompt.
The retrieval layer is the third quiet zone. LLM08:2025 Vector and Embedding Weaknesses and LLM04:2025 Data and Model Poisoning describe what happens when the content your system trusts and retrieves is itself the attack vector. A scan of the application code says nothing about whether the documents in your index can be poisoned by someone who can edit a wiki page. These are not edge cases. For most production LLM applications, this is the center of the board.
Red-teaming an LLM application end to end
AI red-teaming treats the application as a system an attacker will study, not a codebase a tool will fingerprint. The work is structured around the named risks, so findings map to a shared vocabulary your team and your auditors already understand, but the technique is adversarial and human-driven. A senior tester forms hypotheses about how the system can be made to misbehave, then tries to prove them.
End to end means the test does not stop at the model. It follows the data and the authority. We look at the input boundary (every channel where attacker-controlled text can reach the model), the system prompt and its leakage surface (LLM07:2025 System Prompt Leakage, because a leaked prompt hands an attacker the map), output handling (LLM05:2025 Improper Output Handling, where model output flows into a browser, a shell, or a downstream API without being treated as untrusted), and the action surface (the tools, the agent’s autonomy, and the blast radius of a single coerced action).
We anchor the adversary’s behavior to MITRE ATLAS, the public knowledge base of real-world adversary tactics and techniques against AI systems. ATLAS organizes that behavior into tactics, the attacker’s goals at each stage, such as reconnaissance, gaining ML model access, AI/ML attack staging, exfiltration, and impact. Mapping a finding to the tactic it advances turns a one-off exploit into a threat-model entry your team can reason about and defend systematically. (We describe ATLAS at the tactic level here on purpose, rather than asserting a specific technique number, because the right answer for a given finding depends on the exact behavior observed in your system.)
The output of a real test is not a pass or fail stamp. It is a set of demonstrated attack paths, each with the steps to reproduce it, the named risk it maps to, the ATLAS tactic it advances, the business impact if exploited, and a concrete remediation. That is a different artifact from a scan report, and it is the one that actually changes what you ship.
Prompt injection across the agent stack
Prompt injection (LLM01:2025) is treated by many teams as a single-surface problem: sanitize the user’s message and you are done. In an agentic system, that framing is dangerously incomplete, because the user’s message is only one of many places untrusted text enters the model’s context.
Consider the channels in a typical tool-using agent. The user types a message. The agent retrieves documents to ground its answer. It calls a tool and reads the tool’s output back into context. It reads a file, an email, a web page, or a webhook payload it was asked to act on. Every one of those is a place an attacker can plant instructions, and the model does not natively distinguish data it should act on from data it should merely read. This is indirect prompt injection, and it is the version that breaks production systems, because the attacker never has to touch the user-facing input at all.
The agent stack multiplies the problem. In a multi-step or multi-agent system, the output of one model becomes the input of the next. A planted instruction can survive several hops, mutate as it passes through reasoning steps, and detonate at a stage far from where it entered. A scanner that checks the user-input sanitizer has tested one door in a building full of windows. Red-teaming walks the whole perimeter: we attempt injection through retrieved content, through tool results, through inter-agent messages, and through any file or payload the agent ingests, then trace whether the injected instruction reaches a tool with real authority. The supply-chain surface underneath those tools is its own discipline, which we cover in MCP Supply Chain Security, and the multi-agent threat model is the subject of MITRE ATLAS for Tool-Using and Multi-Agent AI.
RAG poisoning and retrieval trust
Retrieval-augmented generation is where many teams quietly grant attackers a privileged channel without realizing it. The premise of RAG is that the model grounds its answers in your trusted documents. The unstated assumption is that the documents are trustworthy. RAG poisoning attacks that assumption directly, and it maps to LLM08:2025 Vector and Embedding Weaknesses and LLM04:2025 Data and Model Poisoning.
The mechanism is simple and that is what makes it dangerous. If an attacker can influence the content that lands in your retrieval index, they can plant a document whose text functions as an instruction to the model, or whose framing biases every answer that retrieves it. The attacker does not need access to your model, your weights, or your prompt. They need edit access to a source the index ingests: a shared wiki, a ticketing system, a public web page you crawl, a user-uploaded file, or any data feed that flows into embeddings without review. Once the poisoned content is indexed, it sits dormant until a relevant query retrieves it, then it shapes the model’s output for whoever asked.
Retrieval trust is the property a real test interrogates. We ask where the index gets its content, who can write to those sources, whether ingested content is treated as untrusted instruction-bearing data or as gospel, and what an attacker can make the system say or do by planting content rather than by attacking the model. We also test the embedding layer itself, because LLM08 covers weaknesses in how content is vectorized and matched, not only what the content says. A checklist scan of your RAG application code does not test any of this, because the vulnerability is not in your code. It is in your trust boundary, and trust boundaries are exactly what adversarial testing exists to probe.
How DSE scopes a real test (sampling, point-in-time, honest limits)
A real test is only credible if its scope is honest, so we state ours plainly. AI red-teaming is adversarial, senior-led, and bounded. It is not exhaustive, and any provider who claims their testing is exhaustive is selling certainty they cannot deliver.
We scope a test around your highest-value attack surfaces, agreed with you up front: the model and its prompt boundary, the agent’s tool and action surface, the retrieval and supply-chain layer, output handling, and the system prompt’s leakage exposure. Within that scope, senior testers work adversarially, anchoring findings to the OWASP Top 10 for LLM Applications and to MITRE ATLAS tactics so the results plug into your existing threat model. Testing is hands-on and human-led, not a tool run with a logo on the report, because the attacks that matter are the ones a tool’s signatures do not yet describe.
Three honesty constraints define what the test is and is not.
What red-teaming covers / What it does not cover
What it covers: Adversarial, senior-led probing of the scoped attack surfaces, with demonstrated attack paths, reproduction steps, named-risk mapping (OWASP Top 10 for LLM Applications) and ATLAS-tactic mapping, business-impact framing, and concrete remediation guidance. The goal is to find the failure modes that matter most and give your team what it needs to fix them.
What it does not cover: Testing is point-in-time. It reflects the system as it existed during the engagement, and a model update, a new tool, a config change, or a re-indexed corpus can introduce risk the day after we finish. Testing is sampling-based. Adversarial input space is effectively infinite, so a real test prioritizes high-likelihood, high-impact paths rather than enumerating every possible input. Testing is scoped. We test what we agreed to test, and surfaces outside that scope are not covered. Red-teaming reduces risk; it does not eliminate it, and it makes no certification, conformity, or guarantee claim. The honest framing is readiness improved and concrete weaknesses found and fixed, not a system pronounced safe.
That honesty is the point of the engagement, not a disclaimer bolted onto it. A test that hides its limits produces the same false confidence as the clean scan report it was meant to replace. A test that states its limits, and still demonstrates real attack paths inside an agreed scope, is the artifact that actually moves your security posture.
FAQ
What is AI red-teaming? AI red-teaming is adversarial, human-led security testing of an AI application, where senior testers probe for ways to make the system behave in ways it was never intended to. It maps findings to shared frameworks such as the OWASP Top 10 for LLM Applications and MITRE ATLAS, and it focuses on emergent, compositional attacks like prompt injection, excessive agency, and RAG poisoning that automated pattern matching does not reach.
How is AI red-teaming different from a checklist scan? A checklist scan compares your system to a catalog of known issues and flags the matches, which is useful for misconfigurations and known patterns but blind to novel, compositional attacks. AI red-teaming is adversarial and human-driven: a senior tester forms hypotheses about how the system can be coerced and tries to prove them, producing demonstrated attack paths rather than a list of signature matches. The two answer different questions, and a scan should not be read as a verdict on safety.
Does AI red-teaming cover prompt injection and RAG poisoning? Yes. Prompt injection (LLM01:2025) is tested across the whole agent stack, including indirect injection through retrieved documents, tool results, files, and inter-agent messages, not just the user input box. RAG poisoning maps to LLM08:2025 Vector and Embedding Weaknesses and LLM04:2025 Data and Model Poisoning, and it is tested by examining who can write to the sources your retrieval index ingests and whether that content is treated as untrusted.
Can a test guarantee my AI system is secure? No, and any provider who guarantees it is overselling. Red-teaming is point-in-time, sampling-based, and scoped: it reflects the system during the engagement, prioritizes high-impact attack paths rather than every possible input, and covers only the agreed surfaces. It reduces risk and surfaces concrete weaknesses to fix, which is real value, but it makes no certification or guarantee claim, and a later change to the model, tools, or data can introduce new risk.
What frameworks does DSE map findings to? Findings are mapped to the OWASP Top 10 for LLM Applications by ID and name, so each issue plugs into a vocabulary your team and auditors already use, and to MITRE ATLAS at the tactic level, so each demonstrated attack path becomes a threat-model entry tied to an adversary goal such as reconnaissance, ML model access, attack staging, exfiltration, or impact.
A clean scan report and a real test are not the same artifact, and only one of them tells you what a motivated attacker can actually do to your AI application. A real test starts with the AI Security X-Ray: two weeks, fixed fee, with first findings in 48 hours, and every finding mapped to the OWASP Top 10 for LLM Applications and MITRE ATLAS. If you want a senior team to pressure-test your LLM application the way an attacker would, rather than the way a scanner does, see /ai-security-assessment.html.
Key facts
- A checklist scan is signature-bound pattern matching against known issues, while AI red-teaming is adversarial, senior-led testing that probes the emergent, compositional attacks a static scan cannot reach (DSE, 2026).
- DSE red-teaming is point-in-time, sampling-based, and scoped: it reduces risk and surfaces concrete weaknesses but makes no certification or guarantee claim, with every finding mapped to the OWASP Top 10 for LLM Applications and MITRE ATLAS (DSE, 2026).