§ AI & LLM Security Assessment·OWASP LLM Top 10 · NIST AI RMF

Find prompt injection, data leakage, and agent abuse before attackers do.

Senior-led AI & LLM security testing for the systems you are actually shipping — RAG pipelines, copilots, and tool-using agents. We red-team the model, the retrieval layer, and the agentic layer against the OWASP LLM Top 10, then hand you evidence-backed findings with a remediation roadmap, not a slide deck.

Fixed-fee, fixed-scope. A principal runs the engagement end to end — the person who scopes the work is the person on the keyboard.

LLM security testing for RAG, copilots, and AI agents. Compliance-ready AI red teaming aligned to NIST AI RMF. Security testing for enterprise AI deployments, not just demo chatbots.
Scope a call first findings in 48h on the X-Ray · a principal, every time
§ What we test·five attack surfaces

The whole stack an attacker sees — not just the prompt box.

Most "AI security" scans stop at a chat input. Real exposure lives across the retrieval layer, the tools your agent can call, and the runtime that logs and pays for it. We test all five surfaces and map every finding to the OWASP LLM Top 10 (2025).

Surface 01

Input & output

Direct and indirect prompt injection, jailbreaks, multi-turn Crescendo attacks, system-prompt leakage, and improper output handling that lets model text reach a browser or shell unescaped.

LLM01 · LLM02 · LLM05 · LLM07
Surface 02

Retrieval (RAG / vector DB)

RAG poisoning, embedding-space attacks, retrieval of unauthorized documents, and sensitive-information disclosure through the context window. We seed adversarial documents and watch what the model will repeat.

LLM02 · LLM08
Surface 03

Tool & agentic layer

Excessive agency, tool abuse, confused-deputy chains, and the question every agent team should be able to answer — can the agent be steered into actions outside its purpose, and can you stop it once it has started.

LLM06 · Agentic Top 10
Surface 04

Model & supply chain

Model and data provenance, poisoning exposure, and the dependency surface around it — including MCP supply-chain review using the same integrity checks we shipped in our public mcp-warden gate.

LLM03 · LLM04
Surface 05

Runtime & ops

Unbounded consumption and cost-amplification attacks, guardrail bypass, logging gaps that hide an incident, and missing rate, spend, and abuse controls around the deployment.

LLM10
§ How we work·five phases

A repeatable method, not a one-off scan.

Every engagement follows the same five phases, anchored to the OWASP LLM Top 10 and MITRE ATLAS so findings are defensible to your engineers, your auditors, and your buyers.

01

Scope & threat-model

Map the system, the trust boundaries, the data it touches, and the abuse cases that matter to your business. Agree on rules of engagement, staging-vs-prod safety, and what "in scope" means in writing.

02

Recon & architecture review

Inventory models, prompts, retrieval sources, tools, and the surrounding APIs. Produce an annotated architecture diagram and a threat model the whole team can read.

03

Exploitation

Hands-on adversarial testing across all five surfaces — prompt injection, RAG poisoning, excessive agency, tool abuse, and consumption attacks — with reproducible payloads and captured transcripts.

04

Verification & risk scoring

Confirm each finding, weed out false positives, and score impact and likelihood so you can triage. Real exploits, with proof — never a checklist of theoretical risks.

05

Report & fix plan

An executive summary, evidence-backed technical findings with repro steps, a remediation roadmap your engineers can act on, and a reusable test harness. Optional retest after you ship the fixes.

OWASP LLM Top 10 (2025) OWASP Agentic Top 10 MITRE ATLAS promptfoo · Garak · PyRIT Burp / ZAP for surrounding APIs custom RAG-poison harness

Coverage & deliverables. Every engagement maps findings across all ten OWASP LLM Top 10 categories (LLM01–LLM10) plus the relevant MITRE ATLAS techniques, and ships an executive summary, evidence-backed technical findings with reproduction steps, a prioritized remediation roadmap, and a reusable test harness. A sanitized sample report is available under NDA on request.

§ The offer ladder·diagnostic → sprint → co-pilot

Three fixed-fee tiers, transparent price bands.

Start with a diagnostic, move to a full red team, then keep coverage as your models and prompts change. Every tier is fixed-scope and fixed-fee — you know what you are buying before you buy it.

Entry · diagnostic

AI Security X-Ray

2 weeks · one LLM application
$12k–$18k
  • OWASP LLM Top 10 sweep on one app
  • Prompt-injection, jailbreak, system-prompt-leak, RAG-poison sampling
  • Executive brief + technical findings
  • 90-day remediation roadmap
  • First findings in 48 hours
Full red team

AI Red Team Sprint

4 weeks · one LLM application incl. its RAG + agent layers
$35k–$55k
  • Full OWASP LLM Top 10 + Agentic Top 10 + MITRE ATLAS, scoped to one application
  • Multi-turn / Crescendo, excessive-agency & tool-abuse testing
  • MCP supply-chain review (mcp-warden)
  • Quantitative scoring + engineering playbook
  • Reusable test harness you keep
  • NIST AI RMF / EU AI Act / SOC 2 mapping annex
  • Retest of confirmed critical & high findings within 30 days of report
Retainer

AI Security Co-Pilot

ongoing · continuous coverage
from $8.5k/mo
  • Continuous coverage as models, RAG, prompts, and agents ship
  • Quarterly full re-audit
  • Priority scheduling for new releases
  • Standing access to a senior AI-security practitioner

What the Sprint does not include: the fixed fee covers one LLM application and its RAG and agent layers. It is not a network or infrastructure penetration test, not source-code audit of the surrounding stack, not 24/7 monitoring or managed detection, and not unlimited retesting — retest is scoped to confirmed critical and high findings within 30 days of the report. Additional applications, environments, or retest windows are scoped and priced separately.

Govcon overlay (optional): unclassified AI governance, NIST AI RMF readiness, and control-mapping for COTS / SaaS pursuing federal authorization — a NIST AI RMF + FedRAMP + CMMC mapping annex, controlled-data handling, and an MSA / DPA + vendor-risk package. Advisory control-mapping, not certification; we do not perform classified work or act as a prime on classified vehicles.

Indicative bands. The prices above are starting ranges. Final scope and a fixed price are confirmed in a discovery call before any engagement begins.

§ Why us·real IP, not slideware

We build the security tools we test with.

Public IP
mcp-warden
Our open-source MCP supply-chain security gate — 164 tests, default-block hardening, JCS + SHA-256 integrity lock. We use it on every Sprint's MCP review. See the repo ↗
Adversarial tooling
conclave
A multi-model council we built for adversarial design review — the same mindset we bring to red-teaming your AI system: assume the model is hostile and prove otherwise.
Method & people
Senior-only bench
Engagements run on a published method — OWASP LLM Top 10 + MITRE ATLAS — by senior-only practitioners. No junior hand-off, no rented dashboard. The person who scopes the work is the person on the keyboard.
§ Compliance mapping·findings that map to frameworks

Every finding ties to a control your auditors recognize.

The Red Team Sprint ships a mapping annex so your security and compliance teams can connect each finding to the frameworks they already answer to.

FrameworkWhat we map to it
NIST AI RMFGovern / Map / Measure / Manage functions — findings mapped to the AI risk a control is meant to mitigate.
EU AI ActRisk-tier obligations and the testing / robustness expectations for high-risk and general-purpose AI systems.
FedRAMPAI-overlay considerations for systems pursuing or holding an authorization, with controlled-data handling.
CMMCAI security considerations for defense-industrial-base contractors handling CUI.
SOC 2Security and availability criteria touched by your AI deployment, framed for an auditor.
§ Objection FAQ·the questions buyers actually ask

Straight answers before you scope.

What exactly do you test?

Your LLM application end to end — prompts, the RAG / retrieval layer, the tools and agents it can drive, the model supply chain, and the runtime. Exact scope is fixed in writing during the scoping phase so there are no surprises in either direction.

Is this a real test or a checklist?

A real test. We run hands-on adversarial attacks with reproducible payloads and captured transcripts. The checklist (OWASP LLM Top 10, ATLAS) is how we organize coverage — not what we hand you instead of evidence.

How do you handle false positives?

Every finding is verified before it ships. The verification phase exists specifically to confirm exploitability and drop anything that does not reproduce, so your engineers spend time on real risk.

How do you handle our prompts and data?

Under an MSA / DPA with defined handling and retention. Test artifacts and transcripts are scoped to the engagement; we agree retention and deletion terms up front, and we support controlled-data constraints for federal work.

Which models and providers do you cover?

Provider-neutral. We test the system you run — hosted APIs, open-weight models, or a mix — including the RAG and agent layers around them, regardless of who built the underlying model.

Do you test against staging or production?

Whatever is safe and representative. We prefer a staging mirror for destructive tests and agree blast-radius limits, rate caps, and rollback before any test touches a live system.

Is remediation and retest included?

The roadmap is part of every tier. The Red Team Sprint includes a 30-day retest after you ship fixes; the X-Ray retest can be added; the Co-Pilot retainer covers continuous re-testing as you change.

Can your team clear procurement?

Yes. We provide an MSA / DPA and a vendor-risk package, and we are set up to answer security questionnaires. The Govcon overlay adds controlled-data handling and federal mapping.

Are you acceptable for federal / controlled-data work?

We provide unclassified AI governance and NIST AI RMF readiness for COTS / SaaS pursuing federal authorization, and we scope engagements for controlled-data handling from the start. To be clear: this is advisory control-mapping. We are not a FedRAMP 3PAO or CMMC C3PAO and do not perform certification or authorization.

§ Related·where this sits
Safe AI & Security Foundation → Free 30-min Cyber Risk Check → AI Security ladder → Federal capability →
§ Scope a call·fixed-fee, scoped up front

Find the holes in your AI before someone else does.

Tell us what you are shipping — a RAG app, a copilot, an agent platform — and we will scope a fixed-fee assessment and respond within 48 hours. A principal runs it, start to finish.

Scope a call
§ What this is·and what it isn't

Point-in-time security testing. Not a guarantee.

An AI security assessment is a point-in-time evaluation of the system as scoped and as it exists during testing. It is a rigorous, evidence-backed read on exploitable risk — not a certification, attestation, or warranty that a system is secure.

We are a lean, senior advisory firm. We do not run a 24/7 SOC and do not provide round-the-clock monitoring or managed detection and response. Where continuous coverage is needed, it is scoped to a retainer or delivered through a vetted partner you contract.

We make your AI systems more defensible and give you the evidence to act. We never claim to prevent every attack, find every flaw, or guarantee an outcome we cannot control. Findings are sampling-based and time-boxed; an assessment cannot guarantee the absence of vulnerabilities.

DSE provides advisory security consulting and control-mapping. We are not a FedRAMP 3PAO, a CMMC C3PAO, or a Registered Provider Organization (RPO), and we do not perform regulatory certification or authorization. Where we describe "mapping to NIST AI RMF, the EU AI Act, FedRAMP, CMMC, or SOC 2," that means advisory alignment, not certification.

All engagements are governed by a signed SOW / MSA that includes a limitation of liability and requires written client authorization to test the in-scope systems before any testing begins.