Senior-led AI & LLM security testing for the systems you are actually shipping — RAG pipelines, copilots, and tool-using agents. We red-team the model, the retrieval layer, and the agentic layer against the OWASP LLM Top 10, then hand you evidence-backed findings with a remediation roadmap, not a slide deck.
Fixed-fee, fixed-scope. A principal runs the engagement end to end — the person who scopes the work is the person on the keyboard.
Most "AI security" scans stop at a chat input. Real exposure lives across the retrieval layer, the tools your agent can call, and the runtime that logs and pays for it. We test all five surfaces and map every finding to the OWASP LLM Top 10 (2025).
Direct and indirect prompt injection, jailbreaks, multi-turn Crescendo attacks, system-prompt leakage, and improper output handling that lets model text reach a browser or shell unescaped.
RAG poisoning, embedding-space attacks, retrieval of unauthorized documents, and sensitive-information disclosure through the context window. We seed adversarial documents and watch what the model will repeat.
Excessive agency, tool abuse, confused-deputy chains, and the question every agent team should be able to answer — can the agent be steered into actions outside its purpose, and can you stop it once it has started.
Model and data provenance, poisoning exposure, and the dependency surface around it — including MCP supply-chain review using the same integrity checks we shipped in our public mcp-warden gate.
Unbounded consumption and cost-amplification attacks, guardrail bypass, logging gaps that hide an incident, and missing rate, spend, and abuse controls around the deployment.
Every engagement follows the same five phases, anchored to the OWASP LLM Top 10 and MITRE ATLAS so findings are defensible to your engineers, your auditors, and your buyers.
Map the system, the trust boundaries, the data it touches, and the abuse cases that matter to your business. Agree on rules of engagement, staging-vs-prod safety, and what "in scope" means in writing.
Inventory models, prompts, retrieval sources, tools, and the surrounding APIs. Produce an annotated architecture diagram and a threat model the whole team can read.
Hands-on adversarial testing across all five surfaces — prompt injection, RAG poisoning, excessive agency, tool abuse, and consumption attacks — with reproducible payloads and captured transcripts.
Confirm each finding, weed out false positives, and score impact and likelihood so you can triage. Real exploits, with proof — never a checklist of theoretical risks.
An executive summary, evidence-backed technical findings with repro steps, a remediation roadmap your engineers can act on, and a reusable test harness. Optional retest after you ship the fixes.
Coverage & deliverables. Every engagement maps findings across all ten OWASP LLM Top 10 categories (LLM01–LLM10) plus the relevant MITRE ATLAS techniques, and ships an executive summary, evidence-backed technical findings with reproduction steps, a prioritized remediation roadmap, and a reusable test harness. A sanitized sample report is available under NDA on request.
Start with a diagnostic, move to a full red team, then keep coverage as your models and prompts change. Every tier is fixed-scope and fixed-fee — you know what you are buying before you buy it.
What the Sprint does not include: the fixed fee covers one LLM application and its RAG and agent layers. It is not a network or infrastructure penetration test, not source-code audit of the surrounding stack, not 24/7 monitoring or managed detection, and not unlimited retesting — retest is scoped to confirmed critical and high findings within 30 days of the report. Additional applications, environments, or retest windows are scoped and priced separately.
Govcon overlay (optional): unclassified AI governance, NIST AI RMF readiness, and control-mapping for COTS / SaaS pursuing federal authorization — a NIST AI RMF + FedRAMP + CMMC mapping annex, controlled-data handling, and an MSA / DPA + vendor-risk package. Advisory control-mapping, not certification; we do not perform classified work or act as a prime on classified vehicles.
Indicative bands. The prices above are starting ranges. Final scope and a fixed price are confirmed in a discovery call before any engagement begins.
The Red Team Sprint ships a mapping annex so your security and compliance teams can connect each finding to the frameworks they already answer to.
| Framework | What we map to it |
|---|---|
| NIST AI RMF | Govern / Map / Measure / Manage functions — findings mapped to the AI risk a control is meant to mitigate. |
| EU AI Act | Risk-tier obligations and the testing / robustness expectations for high-risk and general-purpose AI systems. |
| FedRAMP | AI-overlay considerations for systems pursuing or holding an authorization, with controlled-data handling. |
| CMMC | AI security considerations for defense-industrial-base contractors handling CUI. |
| SOC 2 | Security and availability criteria touched by your AI deployment, framed for an auditor. |
Your LLM application end to end — prompts, the RAG / retrieval layer, the tools and agents it can drive, the model supply chain, and the runtime. Exact scope is fixed in writing during the scoping phase so there are no surprises in either direction.
A real test. We run hands-on adversarial attacks with reproducible payloads and captured transcripts. The checklist (OWASP LLM Top 10, ATLAS) is how we organize coverage — not what we hand you instead of evidence.
Every finding is verified before it ships. The verification phase exists specifically to confirm exploitability and drop anything that does not reproduce, so your engineers spend time on real risk.
Under an MSA / DPA with defined handling and retention. Test artifacts and transcripts are scoped to the engagement; we agree retention and deletion terms up front, and we support controlled-data constraints for federal work.
Provider-neutral. We test the system you run — hosted APIs, open-weight models, or a mix — including the RAG and agent layers around them, regardless of who built the underlying model.
Whatever is safe and representative. We prefer a staging mirror for destructive tests and agree blast-radius limits, rate caps, and rollback before any test touches a live system.
The roadmap is part of every tier. The Red Team Sprint includes a 30-day retest after you ship fixes; the X-Ray retest can be added; the Co-Pilot retainer covers continuous re-testing as you change.
Yes. We provide an MSA / DPA and a vendor-risk package, and we are set up to answer security questionnaires. The Govcon overlay adds controlled-data handling and federal mapping.
We provide unclassified AI governance and NIST AI RMF readiness for COTS / SaaS pursuing federal authorization, and we scope engagements for controlled-data handling from the start. To be clear: this is advisory control-mapping. We are not a FedRAMP 3PAO or CMMC C3PAO and do not perform certification or authorization.
Tell us what you are shipping — a RAG app, a copilot, an agent platform — and we will scope a fixed-fee assessment and respond within 48 hours. A principal runs it, start to finish.
Scope a call →An AI security assessment is a point-in-time evaluation of the system as scoped and as it exists during testing. It is a rigorous, evidence-backed read on exploitable risk — not a certification, attestation, or warranty that a system is secure.
We are a lean, senior advisory firm. We do not run a 24/7 SOC and do not provide round-the-clock monitoring or managed detection and response. Where continuous coverage is needed, it is scoped to a retainer or delivered through a vetted partner you contract.
We make your AI systems more defensible and give you the evidence to act. We never claim to prevent every attack, find every flaw, or guarantee an outcome we cannot control. Findings are sampling-based and time-boxed; an assessment cannot guarantee the absence of vulnerabilities.
DSE provides advisory security consulting and control-mapping. We are not a FedRAMP 3PAO, a CMMC C3PAO, or a Registered Provider Organization (RPO), and we do not perform regulatory certification or authorization. Where we describe "mapping to NIST AI RMF, the EU AI Act, FedRAMP, CMMC, or SOC 2," that means advisory alignment, not certification.
All engagements are governed by a signed SOW / MSA that includes a limitation of liability and requires written client authorization to test the in-scope systems before any testing begins.