A chief risk officer at a regional bank put a question to her model risk lead that had no comfortable home. “We are piloting an agent that opens, escalates, and closes servicing tickets on its own. Under which guidance do we govern it?” Six weeks earlier the answer would have been SR 11-7. Now it is harder, because the agencies have explicitly said the model risk guidance does not reach that system. The thesis of this piece is the part that gets misread most: SR 26-2 pulled agentic AI out of model risk guidance, but it did not pull agentic AI out of regulation. The laws that govern what a bank does to a customer apply in full to what an agent does to a customer, and the governance gap the carve-out created is one each bank now owns.
The governance gap SR 26-2 created
SR 26-2, the Revised Guidance on Model Risk Management, was issued on April 17, 2026. The revised text characterizes generative and agentic AI as “novel and rapidly evolving” and states they are governed by “other risk-management practices” rather than the model risk guidance. The effect is a clean carve-out: the non-deterministic language models and multi-step agents that gave model risk teams the most trouble now sit outside the framework that used to be the default answer for AI. For a full read of what SR 26-2 changed relative to SR 11-7 and its model risk scope, the move is an evolution with a sharper boundary, not a rewrite.
That boundary creates a gap. Agentic AI must be governed somewhere, and the guidance most banks would reach for first now says, in effect, “not here.” A program that leaves its agents inside the model risk inventory because that is where AI used to live is governing them under a framework the primary guidance has explicitly set aside.
The point has to be said plainly, because the carve-out is so easy to read as relief: it does not mean agentic AI is unregulated. SR 26-2 establishes supervisory expectations for model risk and declines to extend them to agentic systems; it does not, and could not, suspend the statutes and regulations governing how a bank treats its customers, manages its vendors, and runs its operations. An examiner who cannot cite SR 26-2 against your agent can still cite consumer protection law, fair lending law, and third-party risk guidance, none of which carry a generative or agentic exclusion. The carve-out is a scope decision about one piece of supervisory guidance, not a safe harbor.
The agencies have signaled where dedicated treatment is headed: the Federal Reserve, OCC, and FDIC have announced their intent to issue a request for information specifically addressing AI, including generative and agentic AI. As of June 2026 that RFI has not been published, and the defensible posture is not to wait for it. The frameworks that already apply are enough to build a real control program now.
What existing law and guidance does apply
Be precise about what kind of authority each source carries, because the controls a bank owes differ sharply between a statute, a regulation, and a voluntary framework.
CFPB UDAP and UDAAP authority is law, not guidance. The prohibition on unfair, deceptive, and abusive acts or practices sits in statute under Dodd-Frank sections 1031 and 1036, and it reaches the conduct regardless of the tool. If an agent sends a deceptive disclosure, mishandles a dispute in a way that causes substantial avoidable injury, or takes an abusive action against an account, the bank is exposed. There is no “the model did it” defense; an autonomous system acting on the bank’s behalf does not change who is accountable.
ECOA and Regulation B are law and regulation that apply directly, not optional best practice. Any agent participating in a credit decision, an adverse action, or a collections workflow has to avoid both disparate treatment and disparate impact, and support accurate, specific adverse action reasons. An agent that denies or restricts credit and cannot produce the principal reasons is a Regulation B problem, not a model-performance footnote.
Third-party risk guidance from the OCC, Federal Reserve, and FDIC applies to the AI supply chain behind the agent. The foundation model provider, the orchestration platform, and the tool integrations an agent calls are critical information and communications technology components of a banking process, carrying the same expectations as any material third-party relationship: due diligence, contract terms covering data use and notice of material changes, and incident management. A provider that silently updates the model your agent runs on has changed a system your customers depend on.
NIST AI RMF is voluntary and carries no independent compliance obligation in US banking. It has no certification program and no independent regulatory force. That said, prudential regulators increasingly cite its concepts in supervisory expectations and examination feedback, making it the practical vocabulary for governance conversations. Its four functions, Govern, Map, Measure, and Manage, are how supervisory conversations about AI increasingly get structured. This is the backbone of practical AI governance for banks: pair the binding obligations with a recognized framework so one program answers the questions a regulator, a board, and a procurement team all ask. Finally, operational risk frameworks apply the moment an agent runs in production: kill-switches, failover, and incident classification are the controls any system that can affect customers or move money is expected to have, and an agent acting at machine speed across thousands of accounts inherits them in full.
Eight governance gaps examiners are finding
These eight gaps are surfacing in 2026 examinations and supervisory dialogue as agents move from pilot to production, the recurring places where a bank’s agentic AI deployment outruns its governance.
- Missing model inventory entries. Agents are listed as “applications” rather than models or AI systems, so they never enter the inventory governance, risk, and examiners use to see what the bank runs.
- No autonomous decision boundary document. There is no written, tested statement of what the agent may do without human approval and what it must escalate, so the answer lives in code rather than policy.
- Inadequate pre-deployment validation for multi-step workflows. Validation tests single responses rather than the chained sequence of actions, and a workflow can be plausible at each step yet harmful in aggregate.
- No decision evidence records. The agent does not log, per action, the inputs it saw, the model version it ran, the rules it applied, and its rationale, so no one can reconstruct why it acted.
- No tiering of human-in-the-loop versus human-on-the-loop decisions. Every decision is treated the same, with no deliberate line between actions a human must approve before execution and actions a human only reviews and overrides.
- Weak monitoring. There is no anomaly detection on action patterns, so a denial-rate spike or a drift in which accounts get flagged goes unnoticed until a customer or regulator surfaces it.
- No kill-switch or surge playbook. There is no tested way to halt the agent quickly and no playbook for when it behaves badly at volume. The ability to stop a fast system fast is itself a control.
- Missing policy update process. There is no defined owner for updating agent behavior when a regulation changes, so the agent keeps running the old behavior until someone notices.
A fast way to pressure-test a deployment against this list is to run it through an AI governance checklist and confirm each gap is closed before an examiner finds it open.
The control architecture: five foundations
Closing those gaps is a design problem, not a documentation problem. Five foundations, each mapping directly to gaps above, define the architecture an agentic AI deployment in a regulated bank needs.
A deterministic decision engine underneath the generative interface. Use the language model for what it is good at, understanding and producing natural language, and put a deterministic rules engine underneath it for anything that constitutes a regulated decision. The agent can converse and route in natural language, but the actual decision, the credit outcome, the account action, the disclosure sent, runs through logic you can read, test, and reproduce. Do not let a free-form language model generate the reason codes for a regulated decision; those come from the rules engine so they are accurate, consistent, and defensible.
Autonomous decision boundaries per use case. For each use case, write down and test the line between what the agent may do on its own and what it must escalate. The boundary is specific, not aspirational: a servicing agent may issue a standard fee reversal up to a defined dollar amount but must escalate anything above it; a fraud agent may clear a low-risk alert but may not close an account.
Human-in-the-loop versus human-on-the-loop tiering. High-risk actions, a SAR filing, an account closure, a credit adverse action, sit in the loop: a human approves before execution. High-volume routine tasks, where approving each one would defeat the purpose, sit on the loop: the agent acts and a human monitors, samples, and can override. This is exactly the distinction examiners are now asking banks to articulate.
Decision evidence records and a comprehensive audit trail. Log every agent action with the agent’s identity, the model version it ran, the data sources and inputs it used, and the decision rationale. Design against reconstruction: a reviewer should be able to take any single action and rebuild why it happened. Pair this with least-privilege, scoped API access per agent, so each agent touches only the systems and data its use case requires.
Kill-switches and real-time monitoring. Build a tested way to halt the agent fast, wired to real-time monitoring with predefined thresholds and automated triggers. The monitoring watches action patterns, not just uptime: a denial-rate spike, an escalation-volume collapse, an unusual concentration of actions against a customer segment. When a threshold trips, the system can throttle or stop the agent automatically rather than waiting for a human to notice a dashboard.
Use-case controls
The architecture is general; the controls get specific by use case, because the binding obligations differ by what the agent touches.
Credit and underwriting agents carry the full weight of fair-lending control. An agent in a credit decision is subject to the same ECOA and Regulation B obligations as any credit model: disparate treatment and disparate impact testing, and accurate, specific adverse action reasons for every denial or downgrade. The reason codes must come from the deterministic engine, never a free-form language-model generation, because a reason that cannot be reproduced and defended is a Regulation B exposure.
Collections and servicing agents live squarely under UDAP and UDAAP authority, and under the FDCPA for entities it covers. An agent that communicates about debt, applies fees, or negotiates arrangements can produce unfair, deceptive, or abusive outcomes at scale, so the controls must include detection of vulnerable customers and escalation paths for situations a fully automated workflow should not own. The agent should recognize when it is out of its depth, a hardship indicator, a dispute, a customer in distress, and route to a human.
AML and fraud agents can earn autonomy on the low-risk end while staying tightly governed on the high-risk end. An agent can auto-clear low-risk alerts so analysts focus on real signals, but SAR filings and account closures require a human in the loop. The audit trail must let the money laundering reporting officer reconstruct every decision, including the alerts it cleared, since a cleared alert is a decision too. The supervisory trend is most visible here: the supervisory attention to AI governance controls — including the ability to halt or override AI systems in production — has become a consistent theme in examination preparation and industry dialogue, and an AML or fraud agent that cannot be stopped quickly is exactly what that question is aimed at.
What this guide is / What it is not
What it is: a practitioner guide to governing agentic AI in a bank when SR 26-2 has carved those systems out of model risk guidance, mapping the laws and frameworks that still apply, the governance gaps examiners are finding, and the control architecture a regulated deployment needs. What it is not: legal or regulatory advice, a certification, or a guarantee of any exam or audit outcome. DSE prepares organizations for audit and examination and strengthens the governance posture behind your AI systems. We do not certify, and we do not guarantee any exam or audit result. SR 26-2 establishes supervisory expectations and does not apply to generative or agentic AI; the carve-out does not make those systems unregulated, since UDAP/UDAAP, ECOA and Regulation B, and third-party risk rules still apply. NIST AI RMF is voluntary with no certification program.
FAQ
Does SR 26-2 apply to agentic AI?
No. SR 26-2 explicitly carved it out. But the carve-out does not make agentic AI unregulated: CFPB UDAP/UDAAP authority under Dodd-Frank, ECOA and Regulation B, and interagency third-party risk rules all still apply, none of which carry a generative or agentic exclusion. It is a scope decision about one piece of supervisory guidance, not a safe harbor.
What laws apply to an AI agent that makes customer-facing decisions?
The laws that govern the conduct apply regardless of the tool. CFPB UDAP/UDAAP authority under Dodd-Frank sections 1031 and 1036 is statutory law reaching unfair, deceptive, or abusive customer actions, with no model-did-it defense. ECOA and Regulation B are law and regulation requiring any agent in credit decisions or adverse actions to avoid disparate treatment and impact and support accurate adverse action reasons. Interagency third-party risk guidance governs the model and orchestration vendors behind the agent.
What is an autonomous decision boundary?
An autonomous decision boundary is a written, tested statement of exactly what an AI agent may do on its own without human approval and what it must escalate. It is specific to each use case: a servicing agent might issue a fee reversal up to a set dollar amount but escalate anything above it. Without it, the answer lives in code rather than policy, which is one of the governance gaps examiners are finding.
Do banks need a kill-switch for agentic AI?
Yes. A tested way to halt an agentic system quickly is an operational risk control any production system that can affect customers or move money is expected to have, and it has become a supervisory focus: the supervisory attention to AI governance controls — including the ability to halt or override AI systems in production — has become a consistent theme in examination preparation and industry dialogue. Pair it with real-time monitoring on action patterns and thresholds that can throttle or stop the agent automatically.
DSE’s banking AI governance practice helps banks map their agentic AI deployments to the frameworks that do apply — NIST AI RMF, UDAP/UDAAP exposure analysis, third-party risk, and operational controls — and builds the governance infrastructure that SR 26-2 does not provide. The AI governance readiness assessment delivers a risk register and audit-ready evidence package at a fixed fee. Scope the engagement →
Key facts
- SR 26-2 explicitly excluded generative and agentic AI from model risk guidance scope
- CFPB UDAP/UDAAP, ECOA/Reg B, and third-party risk rules still apply to agentic AI
- Examiners are making AI governance (including kill-switches) a core question in every routine exam
- Banks need autonomous decision boundaries, decision evidence records, and kill-switches even without formal agentic AI guidance