AML transaction monitoring model validation is the independent assessment of a bank’s automated transaction-monitoring system to confirm it detects suspicious activity reliably, that its scenarios and thresholds are tuned with evidence, and that its outputs hold up under examination. For a statistical or machine-learning monitoring system, that validation now sits inside the general model risk framework of SR 26-2, the interagency guidance issued April 17, 2026 that superseded both SR 11-7 and the SR 21-8 BSA/AML statement. It also runs alongside the FFIEC BSA/AML independent-testing obligation, a separate and mandatory program pillar that does not go away because a system is a model. The two obligations together, not either one alone, are what a BSA Officer and a Head of Model Risk are jointly accountable for.
Why BSA/AML monitoring needs validation now
The supervisory lineage matters, because the wrong version of it produces the wrong program. SR 11-7, the 2011 joint Federal Reserve and OCC guidance, set the general model risk discipline: development and use, independent validation, and governance. On April 9, 2021 the agencies issued an interagency statement, carried for the Fed as SR 21-8, that affirmed the SR 11-7 principles apply to BSA/AML systems that meet the definition of a model, and tailored how those principles apply. The 2021 statement allowed banks to categorize systems, preserved flexibility to update scenarios and thresholds quickly as financial-crime typologies shift, and worked to avoid duplicative testing across the model risk and BSA/AML functions. What it did not do is exempt BSA/AML monitoring from model risk management. Reading SR 21-8 as a carve-out was always a misread.
That entire stack was replaced on April 17, 2026. SR 26-2, the Revised Guidance on Model Risk Management, supersedes SR 11-7, and on the OCC side Bulletin 2026-13 rescinds OCC Bulletin 2011-12 and OCC Bulletin 2021-19, the OCC counterpart to the BSA/AML statement. The practical consequence is structural: there is no longer a separate BSA/AML-specific model risk statement to point to. A BSA/AML system that meets the revised model definition now folds into the single, general SR 26-2 framework. SR 26-2 is non-binding guidance and does not set enforceable standards on its own, but model risk that is left poorly managed can still be treated as an unsafe or unsound practice, which is a basis for supervisory action under existing authority.
Alongside the model risk track sits the obligation that was always there. The FFIEC BSA/AML Examination Manual makes independent testing one of the core pillars of a BSA/AML compliance program. That testing is risk-based, performed by qualified parties independent of the BSA function or by a third party, and it covers the effectiveness of automated transaction-monitoring and suspicious-activity systems. The manual does not use the SR 11-7 or SR 26-2 model-validation vocabulary, and it would be inaccurate to claim it mandates “model validation” by name. What it does is require that, where a monitoring system is model-like, examiners expect to see validation, scenario and threshold tuning, and effectiveness testing as part of that independent review.
Is your transaction-monitoring system a model?
SR 26-2 narrowed the model definition to complex quantitative methods that apply statistical, economic, or financial theory. That narrowing is the first question every program now has to answer for each system it runs, because the answer routes the system to a different obligation set.
A statistical or machine-learning transaction-monitoring system is a model. Segmentation that learns customer peer groups, anomaly detection that scores deviation from expected behavior, and alert-scoring or risk-ranking that prioritizes investigations all apply quantitative theory to produce estimates, which puts them squarely inside the SR 26-2 model definition and therefore in scope for validation. A simple deterministic rules engine, by contrast, a fixed if-then threshold with no statistical estimation, may fall outside the revised model definition. That distinction is real, but it is not an exit. A rules engine that falls outside SR 26-2 is still governed by the independent BSA/AML testing obligation and by safety-and-soundness expectations. It still has to work, and you still have to prove it works.
This is also where the AI angle becomes concrete. Banks increasingly run machine learning inside the monitoring stack precisely to cut false positives and surface novel typologies, and every one of those machine-learning components is a model that needs validation under SR 26-2 and effectiveness testing under the FFIEC manual. For institutions extending model risk discipline to learned systems, our companion guide on AI model risk management under SR 11-7 and the breakdown of what changed under SR 26-2 walk the framework shift in detail.
What independent validation of an AML monitoring model covers
Validation of an AML monitoring model carries the three model risk pillars over to the financial-crime context and adds the effectiveness lens the FFIEC manual expects. Conceptual soundness asks whether the scenario logic and segmentation map to the institution’s own BSA/AML risk assessment and to the typologies it actually faces, rather than to a vendor’s default library. Data integrity asks whether the transaction, customer, and reference data feeding the system is complete and accurate, because a monitoring model is only as good as the feeds beneath it. Ongoing monitoring and outcomes analysis track alert-to-SAR yield, productivity, and tuning drift over time. Through all of it, independence is the non-negotiable: the work is done by qualified parties independent of the BSA function, or by a third party, and documented so a reviewer could follow it end to end.
The scope below is the practical version of that discipline, a checklist a validator can work against.
- Conceptual soundness. Scenario and segmentation logic ties back to the institution’s BSA/AML risk assessment and documented typologies.
- Data integrity. Completeness and accuracy of the transaction, customer, and reference data feeding the system is tested, not assumed.
- Coverage. Every product, channel, and high-risk typology in the risk assessment maps to at least one active scenario, with gaps named.
- Threshold and parameter tuning. Above-the-line and below-the-line evidence supports each production setting, refreshed on a defined cadence.
- Ongoing monitoring and outcomes analysis. Alert-to-SAR yield, alert productivity, and tuning drift are tracked and trended.
- Independence and qualification. Testing is performed by qualified parties independent of the BSA function, or by a third party.
- Documentation. The work produces an audit-ready record a third party could reproduce.
Above-the-line and below-the-line testing: the AML tuning method
The single most distinctive piece of AML monitoring validation is threshold and parameter tuning, and the workhorse method is above-the-line and below-the-line testing. Neither term appears in any SR letter; it is a financial-crime tuning-validation practice, and it is the evidence examiners look for when a bank changes a threshold. The logic is simple. Production thresholds draw a line, and the question is what sits on each side of it.
Below-the-line testing lowers a threshold or widens a parameter beneath the production setting and reviews a sample of the additional alerts that would have been generated in the gap. It answers the under-detection question: is the current setting suppressing genuinely suspicious activity the bank should be reporting? Above-the-line testing raises a threshold or tightens a parameter above the production setting and reviews a sample of the alerts that would have been suppressed. It quantifies the false-positive burden the bank could safely shed and, critically, confirms that no true positives sit above the line where a tightening would lose them. Run together, the two tests turn a threshold change from an assertion into an evidenced decision.
| Test | What you change | What you review | What it proves | Evidence output |
|---|---|---|---|---|
| Below-the-line (BTL) | Lower a threshold or widen a parameter beneath the production setting | A sample of the additional alerts created in the gap below the production value | Whether the production setting is missing genuinely suspicious activity (under-detection) | Yield of the gap population, count of would-be SARs missed, a supported case to lower or hold the threshold |
| Above-the-line (ATL) | Raise a threshold or tighten a parameter above the production setting | A sample of the production alerts that would be suppressed at the higher value | Whether tightening would lose any true positives, and how much false-positive volume could be cut | Confirmation that no productive alerts sit above the line, a supported case to raise the threshold and reduce false positives |
The output of this exercise is what makes a tuning change defensible. A bank that raises a structuring threshold without above-the-line evidence is asserting that nothing productive lived in the suppressed band; a bank that produces the above-the-line sample is showing it. That difference is exactly what independent testing and a model validation are meant to surface.
Where the AI components land, and the governance gap
Once machine learning enters the stack, the perimeter has to be drawn deliberately, because SR 26-2 draws a hard line that cuts straight through a modern monitoring program. The machine-learning detection and scoring components are models, in scope for validation. But SR 26-2 explicitly excludes generative and agentic AI from its scope. If the program uses generative AI to draft suspicious-activity-report narratives, summarize alerts, or power an investigator copilot, those components are not SR 26-2 models, and the model risk program is not their governing authority. They sit in a governance gap that the institution owns, and they must be controlled under the institution’s own AI risk-management practices rather than treated as validated models. Our analysis of the agentic AI governance gap for banks maps that territory in full.
Two clarifications keep this from becoming a conflation trap. First, the federal model risk regime, SR 26-2, is distinct from NYDFS Part 500, which supervises AI through a cybersecurity lens for New York-licensed entities, and from SEC Regulation S-P, which governs the safeguarding of customer nonpublic personal information. A monitoring model touches all three surfaces, but each is a separate obligation with its own evidence, and folding them together produces gaps rather than coverage. Second, where a monitoring model also influences decisions with disparate-impact exposure, fair lending discipline applies on its own track; the methods in our AI fair lending model validation framework sit beside, not inside, the AML validation described here.
The cost of getting this wrong is not theoretical. The OCC and other regulators have cited banks in formal agreements and consent orders for failing to independently validate AML monitoring systems, as part of broader unsafe-or-unsound BSA/AML findings. An unvalidated monitoring model is a finding waiting to be written, whether the system is a logistic-regression scorer or a deep-learning anomaly detector.
What this guide is / What it is not
What it is: a practitioner read on validating AML transaction-monitoring systems for a BSA Officer and a Head of Model Risk, covering the SR 26-2 model risk framework, the FFIEC independent-testing obligation, above-the-line and below-the-line tuning, and where machine-learning and generative-AI components fall.
What it is not: legal or regulatory advice, a certification, or a guarantee of any examination or audit outcome. SR 26-2 is non-binding supervisory guidance; the FFIEC manual sets examination expectations, not a model-validation mandate by that name. DSE performs independent validation and prepares organizations for examination and audit. We do not certify, and we do not guarantee that any system will pass an examination. Any vendor promising guaranteed regulatory approval is selling certainty that does not exist.
FAQ
Is AML transaction monitoring a model that has to be validated?
A statistical or machine-learning transaction-monitoring system, including segmentation, anomaly detection, and alert scoring, meets the SR 26-2 model definition and is in scope for model validation. A simple deterministic rules engine may fall outside that definition, but it is still subject to the FFIEC BSA/AML independent-testing obligation and to safety-and-soundness expectations. Either way, the system must be independently tested for effectiveness.
Does SR 26-2 still require model risk management for BSA/AML systems?
Yes. SR 11-7 set the general framework, and the April 9, 2021 interagency statement carried as SR 21-8 affirmed those principles apply to BSA/AML systems that meet the model definition and tailored how. SR 26-2, issued April 17, 2026, superseded both, so there is no separate BSA/AML model risk statement; qualifying BSA/AML models fold into the general SR 26-2 framework. SR 26-2 is non-binding, but poorly managed model risk can still be treated as unsafe or unsound.
What is above-the-line and below-the-line testing in AML?
Below-the-line testing lowers thresholds and reviews the additional alerts that would be created, to quantify suspicious activity the production settings miss. Above-the-line testing raises thresholds and reviews the alerts that would be suppressed, to quantify false positives and confirm no true positives are wrongly suppressed. Both produce the evidence to justify threshold and parameter changes. Neither is named in any SR letter; it is a tuning-validation practice examiners expect where a monitoring system is model-like.
Does the FFIEC BSA/AML manual require model validation?
The FFIEC BSA/AML Examination Manual requires independent testing of the BSA/AML program that is risk-based and performed by qualified parties independent of the BSA function, covering the effectiveness of automated monitoring. It does not use the SR 11-7 or SR 26-2 model-validation vocabulary, but where a monitoring system is model-like, examiners expect to see validation, threshold and scenario tuning, and effectiveness testing as part of that review.
How are AI and generative AI components in transaction monitoring governed?
Machine-learning systems for anomaly detection, alert scoring, and segmentation are models in scope of SR 26-2 validation and FFIEC independent testing. SR 26-2 excludes generative and agentic AI, so components such as alert-narrative drafting or investigator copilots sit outside model risk guidance and must be controlled under the institution’s own AI risk-management practices, not as SR 26-2 models.
The Bottom Line
AML transaction monitoring model validation is no longer governed by a dedicated BSA/AML statement. With SR 26-2 superseding SR 11-7 and the SR 21-8 statement on April 17, 2026, a transaction-monitoring system that meets the revised model definition folds into the general model risk framework, while the FFIEC independent-testing obligation continues to apply on its own track. A BSA Officer and a Head of Model Risk own both, and the work is to validate the monitoring model for conceptual soundness, data integrity, coverage, and effectiveness, then to tune its thresholds with above-the-line and below-the-line evidence rather than assertion.
The line that decides the program is the model definition. Machine-learning detection and scoring are models in scope of validation; a deterministic rules engine may sit outside the definition but never outside independent testing; and generative or agentic components such as alert-narrative drafting fall outside SR 26-2 entirely and have to be governed under the institution’s own AI risk-management practices. Get those boundaries right, document the evidence, and the program is defensible. Get them wrong, and an unvalidated monitoring model becomes the next unsafe-or-unsound finding. Build the validation, the tuning evidence, and the inventory deliberately, and you hold a posture that stands up to scrutiny across all three.
If your AML monitoring program is being pulled into the SR 26-2 model risk perimeter, DSE’s banking AI governance engagement helps banks scope independent validation of transaction-monitoring models, build the above-the-line and below-the-line tuning evidence, and reconcile the machine-learning and generative-AI components against the right authority. Start with an AI governance readiness assessment to surface the gaps, see how each AI use maps to its governing authority in our AI governance for financial services hub, and work the AI governance checklist to assemble the audit-ready evidence. The primary source is the Federal Reserve SR 26-2 letter.
Key facts
- SR 26-2, the interagency Revised Guidance on Model Risk Management issued April 17, 2026, supersedes both SR 11-7 (2011) and the SR 21-8 BSA/AML interagency statement (April 9, 2021), folding BSA/AML systems that meet the revised model definition into the general model risk framework (DSE, 2026).
- The FFIEC BSA/AML Examination Manual makes independent testing a mandatory, risk-based program pillar performed by qualified parties independent of the BSA function, covering the effectiveness of automated transaction-monitoring and suspicious-activity systems (DSE, 2026).
- Above-the-line and below-the-line threshold testing is an AML tuning-validation practice not defined in any SR letter: below-the-line testing lowers thresholds to quantify missed suspicious activity, and above-the-line testing raises thresholds to confirm no true positives are wrongly suppressed (DSE, 2026).
- SR 26-2 narrowed the model definition to complex quantitative methods, so a statistical or machine-learning transaction-monitoring system is a model in scope of validation, while generative and agentic AI components such as alert-narrative drafting are excluded from SR 26-2 and must be governed under the institution's own risk-management practices (DSE, 2026).