Secure Multi-Tenant LLM Platform: A Build-and-Transfer Framework for Regulated Industries
Executive Summary
Most LLM products start as a single-tenant prototype and break the moment a second customer signs. The shortcuts that make a demo fast—shared prompts, one database, an API key in the code—become liabilities the instant a regulated buyer asks how their data is isolated from everyone else’s.
We built this framework while delivering a production multi-tenant LLM SaaS platform under a build-and-transfer engagement: roughly an eleven-week effort from architecture to a working system, ending in a full intellectual-property handoff to the client’s own team. The brief was unforgiving—dozens of API endpoints, per-tenant data and model isolation, identity-provider authentication enforced before any business logic ran, and per-tenant cost attribution that finance could actually reconcile.
The result is a reference architecture, not a product. It assumes the buyer operates in a regulated environment, expects auditors, and intends to own and run the system after we leave. Every decision below is shaped by those three constraints.
The Multi-Tenant Problem Nobody Wants to Talk About
“Multi-tenant” gets used loosely. There is a meaningful difference between a system where tenants are a column in a shared table and a system where a tenant boundary is enforced at every layer—identity, routing, storage, secrets, and billing.
For a regulated client, the loose version is disqualifying. The questions that decide a deal are not about model quality:
- Can tenant A’s prompt or output ever reach tenant B?
- Where is the boundary enforced, and what happens if a single check is bypassed?
- How do you prove, after the fact, which tenant incurred which cost?
- Who holds the secrets, and are they ever written to disk or source control?
A platform answers these with architecture, not policy documents. The framework below is organized around enforcing the tenant boundary in depth, so that no single failure collapses isolation.
Framework Architecture
Request Lifecycle
┌──────────────────────────────────────────────────────────────────────┐
│ Client / Tenant App │
│ (carries a signed JWT, scoped to one tenant) │
└───────────────────────────────────┬────────────────────────────────────┘
│ Authorization: Bearer <JWT>
▼
┌──────────────────────────────────────────────────────────────────────┐
│ HTTP API (≈66 endpoints) │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Gateway Authorizer ── validates JWT (RS256 / JWKS) ───────────│ │
│ │ • signature + expiry • tenant_id claim extracted │ │
│ │ • rejected here → request never reaches business logic │ │
│ └────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────┬────────────────────────────────────┘
│ (authorized + tenant context)
▼
┌──────────────────────────────────────────────────────────────────────┐
│ Application Layer (Lambda) │
│ ┌──────────────┐ ┌──────────────────┐ ┌────────────────────┐ │
│ │ Tenant-scoped│ │ Model Router │ │ Cost Metering │ │
│ │ data access │ │ (per-tenant │ │ (per-tenant │ │
│ │ │ │ model + key) │ │ token/usage) │ │
│ └──────┬───────┘ └────────┬─────────┘ └─────────┬──────────┘ │
└──────────┼────────────────────┼───────────────────────┼───────────────┘
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ RDS Postgres │ │ Secrets │ │ Usage / │
│ (row + schema│ │ Manager │ │ Billing │
│ scoped) │ │ (no keys in │ │ Store │
└──────────────┘ │ code) │ └──────────────┘
└──────────────┘
Layer 1 — Identity at the Edge
Authentication is enforced before the request reaches any business logic. The client carries a JSON Web Token issued by a managed identity provider (a Clerk-style flow), signed with RS256 and verifiable against a published JWKS endpoint.
An authorizer sitting in front of the HTTP API validates the signature against the rotating public keys, checks expiry, and extracts the tenant_id claim. A request with a missing, expired, or malformed token is rejected at the gateway—it never invokes a function, never touches the database, and never appears in application logs as anything but a denied request.
This matters for two reasons. First, the most expensive part of the stack (model inference) is never reached by unauthenticated traffic. Second, the tenant identity arrives as a cryptographically signed claim, not a value the application has to look up or trust from the request body.
Layer 2 — Tenant Isolation in Depth
The tenant_id extracted at the edge becomes the spine of every downstream decision. Isolation is enforced at three points so that no single bug breaks the boundary:
-
Data isolation. Relational data lives in RDS Postgres. Every query is scoped to the authenticated tenant—there is no code path that reads tenant data without a tenant filter derived from the verified claim, not from user-supplied input. Depending on the regulatory profile, this is implemented as enforced row scoping or as per-tenant schema separation; the framework supports both, with the harder boundary reserved for the most sensitive tenants.
-
Model routing isolation. A model router maps each tenant to its permitted model(s) and the credentials used to call them. One tenant cannot route a request through another tenant’s model configuration or provider key. This is also where model-level policy lives: which providers a given tenant is contractually allowed to use, and which are blocked.
-
Secrets isolation. Provider keys and tenant credentials live in a managed secrets manager and are fetched at runtime. No key is committed to source control, baked into a deployment artifact, or written to disk. Rotating a credential is an operation on the secret, not a redeploy.
Layer 3 — Per-Tenant Cost Attribution
A multi-tenant LLM platform that cannot tell you what each tenant cost is a financial liability, because token spend is the dominant variable cost and it is invisible by default.
The framework meters usage at the application layer, keyed on the verified tenant identity, and records it to a usage store separate from operational data. Every model call is attributed to a tenant before the response returns. This produces a defensible per-tenant cost ledger that finance can reconcile against the provider invoice and that the client can use for usage-based or tiered pricing after transfer.
Why an HTTP API With Many Endpoints
The delivered system exposed on the order of sixty-six endpoints behind a single HTTP API. That surface is not accidental sprawl—it reflects a deliberate choice to keep operations small and individually authorizable rather than building a handful of overloaded, mode-switching endpoints.
| Concern | Design choice |
|---|---|
| Authorization granularity | Each endpoint authorized independently at the edge |
| Blast radius | A bug in one operation does not expose unrelated operations |
| Auditability | Access logs map cleanly to discrete business actions |
| Cost control | Compute scales per operation (Lambda), not per monolith |
For a regulated buyer, the audit story is the payoff: every privileged action is its own endpoint with its own access record.
The Build-and-Transfer Delivery Model
This engagement was explicitly build-and-transfer. We did not build a platform to operate on the client’s behalf indefinitely—we built a platform the client’s own engineers would own, run, and extend. That changes how you build.
Documentation is a deliverable, not an afterthought. Architecture, runbooks, and the secrets-and-rotation procedure are written for an engineer who was not in the room during the build.
No proprietary lock-in. The stack is built on managed, widely-understood services—an HTTP API, serverless functions, RDS Postgres, a managed secrets store, and a standard JWT identity provider. The receiving team can hire for these skills.
Clean IP handoff. Source, infrastructure definitions, and credentials transfer to the client. The boundary is explicit: what we built, what they own, and where our responsibility ends.
Indicative Timeline
| Phase | Window | Focus |
|---|---|---|
| Architecture & isolation design | Weeks 1–2 | Tenant boundary model, identity flow, data layout |
| Core platform build | Weeks 3–7 | API surface, authorizer, model router, persistence |
| Cost attribution & hardening | Weeks 8–9 | Per-tenant metering, secrets, security review |
| Transfer & handoff | Weeks 10–11 | Documentation, runbooks, knowledge transfer, IP handoff |
The roughly eleven-week window is achievable precisely because the framework is reused, not reinvented, on each engagement. The isolation model, authorizer pattern, and metering approach are stable; what changes is the tenant policy, the data model, and the regulatory profile.
Security and Governance Posture
This framework is designed for environments that expect scrutiny. Several properties exist specifically to make a security review go faster:
- Authentication is enforced at a single, auditable chokepoint before business logic.
- Tenant isolation is enforced in depth, so review can verify each layer independently.
- Secrets never appear in code or artifacts, so a source-code review cannot leak a credential.
- Per-tenant attribution provides an evidence trail for both cost and access.
We are deliberate about what we claim. This is an architecture pattern that supports a strong compliance posture; it is not a certification, and the receiving organization remains responsible for its own attestations and audits.
Applicability
This framework fits organizations that:
- Sell an LLM-powered product to multiple customers who will not accept shared-tenant data handling.
- Operate in or sell into regulated sectors—healthcare, finance, public sector—where isolation must be demonstrable.
- Want to own and run the platform themselves rather than depend on a vendor indefinitely.
- Need defensible per-tenant cost data to support pricing and finance.
It is overkill for a single-tenant internal tool and premature for a product still searching for its first customer. It is the right framework the moment a second regulated tenant is real.
Getting Started
Organizations evaluating a multi-tenant LLM build should assess four things before writing code:
- Tenant boundary requirements. How hard must isolation be—row-scoped, schema-separated, or fully separated per tenant?
- Identity strategy. Is there an existing identity provider, and can it issue scoped, signed tokens?
- Cost model. Will pricing be flat, tiered, or usage-based? This determines how granular metering must be.
- Ownership intent. Build-and-transfer, or vendor-operated? The answer reshapes documentation and stack choices.
Our build-and-transfer engagements are scoped to leave your team owning a platform they fully understand.
This framework reflects production engagement work by Data Science & Engineering Experts in regulated SaaS environments. Client details are anonymized. It is published as a reference architecture for teams evaluating secure multi-tenant LLM platforms and should be adapted to each organization’s regulatory and operational requirements.