shipping production AI · since 2026 NAICS 541330 / 541511 / 541512 / 541519  ·  CMMC-aware
Selected Work / LLM / case · mework
LLMMulti-Tenant SaaSAI GovernanceCloud Architecture

Secure Multi-Tenant LLM Platform: A Build-and-Transfer Framework for Regulated Industries

The reference architecture behind a production multi-tenant LLM SaaS platform delivered in roughly eleven weeks, with hard tenant isolation, JWT-at-the-edge authentication, and a clean IP handoff.

D
DSE-Experts
Operator-led practice
May 27, 2026
7 min · 1,571 words

Secure Multi-Tenant LLM Platform: A Build-and-Transfer Framework for Regulated Industries

Executive Summary

Most LLM products start as a single-tenant prototype and break the moment a second customer signs. The shortcuts that make a demo fast—shared prompts, one database, an API key in the code—become liabilities the instant a regulated buyer asks how their data is isolated from everyone else’s.

We built this framework while delivering a production multi-tenant LLM SaaS platform under a build-and-transfer engagement: roughly an eleven-week effort from architecture to a working system, ending in a full intellectual-property handoff to the client’s own team. The brief was unforgiving—dozens of API endpoints, per-tenant data and model isolation, identity-provider authentication enforced before any business logic ran, and per-tenant cost attribution that finance could actually reconcile.

The result is a reference architecture, not a product. It assumes the buyer operates in a regulated environment, expects auditors, and intends to own and run the system after we leave. Every decision below is shaped by those three constraints.

The Multi-Tenant Problem Nobody Wants to Talk About

“Multi-tenant” gets used loosely. There is a meaningful difference between a system where tenants are a column in a shared table and a system where a tenant boundary is enforced at every layer—identity, routing, storage, secrets, and billing.

For a regulated client, the loose version is disqualifying. The questions that decide a deal are not about model quality:

A platform answers these with architecture, not policy documents. The framework below is organized around enforcing the tenant boundary in depth, so that no single failure collapses isolation.

Framework Architecture

Request Lifecycle

┌──────────────────────────────────────────────────────────────────────┐
│                            Client / Tenant App                         │
│                  (carries a signed JWT, scoped to one tenant)          │
└───────────────────────────────────┬────────────────────────────────────┘
                                     │  Authorization: Bearer <JWT>
                                     ▼
┌──────────────────────────────────────────────────────────────────────┐
│                         HTTP API (≈66 endpoints)                       │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │  Gateway Authorizer  ── validates JWT (RS256 / JWKS) ───────────│  │
│  │   • signature + expiry      • tenant_id claim extracted          │  │
│  │   • rejected here → request never reaches business logic         │  │
│  └────────────────────────────────────────────────────────────────┘  │
└───────────────────────────────────┬────────────────────────────────────┘
                                     │  (authorized + tenant context)
                                     ▼
┌──────────────────────────────────────────────────────────────────────┐
│                        Application Layer (Lambda)                      │
│   ┌──────────────┐   ┌──────────────────┐   ┌────────────────────┐    │
│   │ Tenant-scoped│   │  Model Router    │   │  Cost Metering     │    │
│   │ data access  │   │  (per-tenant     │   │  (per-tenant       │    │
│   │              │   │   model + key)   │   │   token/usage)     │    │
│   └──────┬───────┘   └────────┬─────────┘   └─────────┬──────────┘    │
└──────────┼────────────────────┼───────────────────────┼───────────────┘
           ▼                     ▼                       ▼
   ┌──────────────┐     ┌──────────────┐        ┌──────────────┐
   │ RDS Postgres │     │   Secrets    │        │  Usage /     │
   │ (row + schema│     │   Manager    │        │  Billing     │
   │  scoped)     │     │ (no keys in  │        │  Store       │
   └──────────────┘     │   code)      │        └──────────────┘
                        └──────────────┘

Layer 1 — Identity at the Edge

Authentication is enforced before the request reaches any business logic. The client carries a JSON Web Token issued by a managed identity provider (a Clerk-style flow), signed with RS256 and verifiable against a published JWKS endpoint.

An authorizer sitting in front of the HTTP API validates the signature against the rotating public keys, checks expiry, and extracts the tenant_id claim. A request with a missing, expired, or malformed token is rejected at the gateway—it never invokes a function, never touches the database, and never appears in application logs as anything but a denied request.

This matters for two reasons. First, the most expensive part of the stack (model inference) is never reached by unauthenticated traffic. Second, the tenant identity arrives as a cryptographically signed claim, not a value the application has to look up or trust from the request body.

Layer 2 — Tenant Isolation in Depth

The tenant_id extracted at the edge becomes the spine of every downstream decision. Isolation is enforced at three points so that no single bug breaks the boundary:

Layer 3 — Per-Tenant Cost Attribution

A multi-tenant LLM platform that cannot tell you what each tenant cost is a financial liability, because token spend is the dominant variable cost and it is invisible by default.

The framework meters usage at the application layer, keyed on the verified tenant identity, and records it to a usage store separate from operational data. Every model call is attributed to a tenant before the response returns. This produces a defensible per-tenant cost ledger that finance can reconcile against the provider invoice and that the client can use for usage-based or tiered pricing after transfer.

Why an HTTP API With Many Endpoints

The delivered system exposed on the order of sixty-six endpoints behind a single HTTP API. That surface is not accidental sprawl—it reflects a deliberate choice to keep operations small and individually authorizable rather than building a handful of overloaded, mode-switching endpoints.

Concern Design choice
Authorization granularity Each endpoint authorized independently at the edge
Blast radius A bug in one operation does not expose unrelated operations
Auditability Access logs map cleanly to discrete business actions
Cost control Compute scales per operation (Lambda), not per monolith

For a regulated buyer, the audit story is the payoff: every privileged action is its own endpoint with its own access record.

The Build-and-Transfer Delivery Model

This engagement was explicitly build-and-transfer. We did not build a platform to operate on the client’s behalf indefinitely—we built a platform the client’s own engineers would own, run, and extend. That changes how you build.

Documentation is a deliverable, not an afterthought. Architecture, runbooks, and the secrets-and-rotation procedure are written for an engineer who was not in the room during the build.

No proprietary lock-in. The stack is built on managed, widely-understood services—an HTTP API, serverless functions, RDS Postgres, a managed secrets store, and a standard JWT identity provider. The receiving team can hire for these skills.

Clean IP handoff. Source, infrastructure definitions, and credentials transfer to the client. The boundary is explicit: what we built, what they own, and where our responsibility ends.

Indicative Timeline

Phase Window Focus
Architecture & isolation design Weeks 1–2 Tenant boundary model, identity flow, data layout
Core platform build Weeks 3–7 API surface, authorizer, model router, persistence
Cost attribution & hardening Weeks 8–9 Per-tenant metering, secrets, security review
Transfer & handoff Weeks 10–11 Documentation, runbooks, knowledge transfer, IP handoff

The roughly eleven-week window is achievable precisely because the framework is reused, not reinvented, on each engagement. The isolation model, authorizer pattern, and metering approach are stable; what changes is the tenant policy, the data model, and the regulatory profile.

Security and Governance Posture

This framework is designed for environments that expect scrutiny. Several properties exist specifically to make a security review go faster:

We are deliberate about what we claim. This is an architecture pattern that supports a strong compliance posture; it is not a certification, and the receiving organization remains responsible for its own attestations and audits.

Applicability

This framework fits organizations that:

It is overkill for a single-tenant internal tool and premature for a product still searching for its first customer. It is the right framework the moment a second regulated tenant is real.

Getting Started

Organizations evaluating a multi-tenant LLM build should assess four things before writing code:

  1. Tenant boundary requirements. How hard must isolation be—row-scoped, schema-separated, or fully separated per tenant?
  2. Identity strategy. Is there an existing identity provider, and can it issue scoped, signed tokens?
  3. Cost model. Will pricing be flat, tiered, or usage-based? This determines how granular metering must be.
  4. Ownership intent. Build-and-transfer, or vendor-operated? The answer reshapes documentation and stack choices.

Our build-and-transfer engagements are scoped to leave your team owning a platform they fully understand.


This framework reflects production engagement work by Data Science & Engineering Experts in regulated SaaS environments. Client details are anonymized. It is published as a reference architecture for teams evaluating secure multi-tenant LLM platforms and should be adapted to each organization’s regulatory and operational requirements.

P
Founder · Principal Engineer
Data & AI engineer · 10+ yrs hands-on

Writes most of the long-form here. Lives in the codebase. Active on GitHub and LinkedIn.

§ Next step

Not sure which of these is you?

Tell us what's broken in a paragraph and a principal reads it directly — or walk the ladder from a low-commitment first engagement up to retained work.

One long-form a week. No marketing.

Subscribe to the Refinery Report. Practitioner deep-dives on AI engineering, security, and the realities of running production systems. Unsubscribe in one click.

~12 issues / quarter