Executive Summary
Most enterprise AI projects do not die because the model was wrong. They die because three questions were never answered before the build started: Is the data semantically ready, or just present? Who owns governance, and how is access actually enforced? What does “done” mean, and how will you measure it in production? This is the architecture brief we run at the front of every engagement. Answer all three with evidence and you have a project that ships. Skip any one and you have a pilot that quietly stalls in month four.
The Failure Pattern Nobody Names
Walk into a stalled enterprise AI project and the post-mortem almost always points at the model. Wrong embedding choice. Bad prompt. The vendor oversold the benchmark. These are the symptoms everyone is comfortable discussing because they are technical, bounded, and someone else’s fault.
The real cause is upstream. The project was greenlit before anyone could answer three questions in writing. The team started building because building feels like progress, and the questions were treated as things to “figure out as we go.”
You do not figure them out as you go. You discover, in month four, that the data was never structured to answer the question being asked, nobody can approve who sees what, and there was never an agreed definition of working. By then the budget is half gone and the demo still only works on the three documents from the original notebook.
This brief is the antidote. It is three questions. We do not start an engagement until all three have evidence-backed answers, not opinions. Run it on your own roadmap before your next AI investment.
Question 1: Is the Data Ready, or Just Present?
There is a difference between data existing and data being usable for the task. Presence is a database with rows in it. Readiness is whether those rows carry the meaning your AI system needs to reason correctly.
The trap is that presence is easy to demonstrate and readiness is not. A stakeholder shows you a 40-table warehouse and says “the data is all there.” It is there. It is also undocumented, full of overloaded columns where status means six different things depending on the source system, and joined on keys that silently changed format in 2023.
What to actually interrogate
- Semantic integrity. Does a field mean the same thing across every source that feeds it? Where are the overloaded columns, the free-text fields holding structured intent, the enums that drifted?
- Provenance and freshness. Where did each record come from, when was it last valid, and does the system know the difference between “no value” and “value is zero”?
- Chunking and context survival. For retrieval systems, does your content survive being split? A policy that reads “employees qualify after one year — unless they are in Sales, who qualify immediately” becomes dangerous nonsense the moment chunking severs the exception from the rule.
- Coverage of the actual question. The data can be pristine and still not contain what the use case requires. Pristine sales records do not answer a churn-cause question if the churn reasons live in support tickets nobody ingested.
The honest answer to Question 1 is frequently: “The data is present and the data is not ready.” That is a finding, not a failure. It tells you the first sprint is data engineering, not model selection — and it saves you from training a system on a foundation that confidently produces wrong answers.
Question 2: Who Owns Governance, and How Is Access Enforced?
The second question kills more projects in regulated and federal environments than any technical constraint. It is deceptively simple: who is allowed to see what the system produces, who approves that, and how is it enforced in the architecture rather than in a policy document?
Teams routinely defer this. They build against a copy of production data in a dev account, get a great demo, and then discover that the data they used can never legally flow through the system they designed. The retrieval layer has no concept of row-level permissions. The model can surface a document to a user who was never cleared for it. There is no audit trail showing who asked what.
What to actually interrogate
- Ownership. Is there a named accountable owner for the data domain — not a committee, a person who can approve access decisions and be held to them?
- Access enforcement in the architecture. Are permissions enforced at the retrieval and API layer, or are they assumed because “only the right people have logins”? Identity must propagate from the request, through auth, into what the system is allowed to retrieve.
- Auditability. Can you reconstruct, after the fact, who asked the system what and what it returned? In regulated contexts this is not optional and cannot be retrofitted cheaply.
- Data classification boundaries. Does the system understand the difference between tiers of sensitivity, and does it refuse to cross them by design rather than by prompt instruction?
Governance is an architecture decision, not a compliance checkbox bolted on before launch. If access control is an afterthought, it becomes the thing that prevents production deployment entirely.
Question 3: What Does “Done” Mean, and How Will You Know It Works?
The third question is the one most teams cannot answer, and its absence is the quietest killer. Without it, quality regresses silently. A prompt change improves one case and breaks four others, and nobody notices until a user complains in production.
“Done” is not “the demo worked.” Done is a measurable definition of acceptable behavior, plus a mechanism that tells you whether you still meet it after every change.
What to actually interrogate
- A definition of correct. For the actual use case, what is a right answer, a wrong answer, and an acceptable refusal? Write it down. If you cannot, the use case is not specified well enough to build.
- An evaluation harness. Is there a curated set of representative cases with known-good outcomes that runs on every change and gates releases? Without it, you are flying on vibes and the most recent demo.
- Production observability. Once live, how do you detect drift, hallucination, and degraded retrieval before users do? Offline eval is necessary and not sufficient.
- A regression gate. Does a quality drop block a release the way a failing unit test blocks a merge? If quality is not enforced in the pipeline, it will erode.
The discipline here is borrowed from software engineering and applied to a probabilistic system: you do not get to call it working until you can prove it is working and prove it stays working.
The Brief as a One-Page Gate
Run the brief as a literal gate before funding the build. Each question gets an evidence-backed answer, not an aspiration.
| Question | Evidence required to pass | Common failing answer |
|---|---|---|
| 1. Data readiness | Documented semantic integrity, provenance, and coverage for the specific task | “The data is all there” |
| 2. Governance & access | Named owner, enforcement at retrieval/API layer, audit trail design | “Only the right people have access” |
| 3. Definition of done | Written correctness criteria, eval harness, regression gate, production observability | “The demo worked great” |
If any row is a failing answer, the project is not ready to build. The first work item is to convert that failing answer into a passing one — which is real, fundable, scoped work, and far cheaper than discovering the gap in month four.
What This Means For You
The three questions are not gatekeeping for its own sake. They are the difference between an AI investment that compounds and one that becomes a line item your CFO asks about next budget cycle.
Before your next AI project gets a sprint plan, get the brief answered in writing. If the answers are honest and uncomfortable, you have just saved a quarter of wasted effort. If the answers come back clean and evidence-backed, build with confidence — you are in the minority that actually will ship.
The model was never the hard part. The brief is.
This brief reflects the assessment framework our team runs at the front of enterprise and regulated-industry AI engagements. It is published as a reference for data and technology leaders evaluating AI investments.