Case Study: PrivateStack — Secure Multi-Tenant AI

delivery

11wks

0 → customer-zero

endpoints

66

production API surface

p95 latency

181ms

/chat route, last 30d

uptime · 12mo

99.94%

since hand-off

providers routed

5

cost-optimized via LiteLLM

The brief

Engagement type

AI Engineering · fixed-fee · 11 wk

IndustryRegulated B2B SaaS

StageSeries B

TeamPrincipal + Staff Eng + SME

CadenceWeekly demo · decision log

CloudAWS us-east-1

Hand-offFull IP + 23-pg runbook

From a Notion doc to customer-zero in eleven weeks.

A regulated-industry B2B SaaS needed a private, multi-tenant LLM platform their security team could approve. They'd tried two earlier engagements with body-shop consultancies. One delivered a deck. The other delivered a Streamlit demo.

We started from the brief, not from a template. Week one was a threat model and architecture review. Week eleven was customer-zero in production. The handoff included a 23-page runbook, full IP transfer, and a 30-day post-launch support window.

The system has run for twelve months since hand-off without our intervention. Their on-call team operates it from the runbook we left.

§01 Architecture.
Production reference.

Every box has a runbook.
Every arrow has an ADR.

AWS-native by default. Bedrock as the primary inference path, LiteLLM as the routing layer, pgvector for retrieval. Per-tenant cost ceilings and observability throughout.

PrivateStack — production reference architecture

fig 01 · v0.84

→ ingress & auth

edge

CloudFront

waf

AWS WAF

identity

Clerk · JWT · OIDC

api gateway

66 routes

→ compute (λ)

λ

auth + tenant

λ

inference · LiteLLM router

λ

billing + usage

→ data & eval

store

pgvector · BM25

models

Bedrock · vLLM fallback

evals

CI · 842 golden

→ observability & ops

traces

per-tenant

logs

structured

cost

route-attributed

runbook

23 pp

§02 Eleven weeks, in order.
What we shipped, when.

No surprises after week one.

Fixed-scope, written decision log on every call, weekly demo on Friday. The scope doc we signed in week zero matches the artifacts handed off in week eleven.

W0

Scope.

Discovery call. Written scope, deliverables, milestones, fee. 48-hour fixed-fee quote.

OutScope doc · MSA · NDA

W1

Architecture & threat model.

System diagram, data-flow, ADR-001 (routing strategy), threat model. End-of-week demo of the scaffolding.

OutArch diagram · TM doc · ADRs 1–4

W2

Auth + tenant schema.

Clerk JWT integration, tenant table, RLS policies, API skeleton with three live routes. CI green.

OutAuth service · tenant model

W3–5

Build out the 66 endpoints.

Inference routes, admin routes, embedding routes. LiteLLM router with five providers, cost ceilings per tenant, fallback strategy.

Out66 routes · LiteLLM config

W6

Retrieval + eval harness.

pgvector + BM25 hybrid retrieval. Eval harness with 842 golden cases. Drift baseline established.

OutRetrieval lib · eval harness

W7–8

Billing, usage, admin console.

Stripe billing, per-tenant usage metering, admin console for support. PII scrubbing wired through every route.

OutBilling · usage · admin UI

W9

Security review & red-team.

IAM hardening, secrets rotation, prompt-injection red-team, fixes for 8 findings. Bedrock IAM clean.

OutRed-team report · IAM policies

W10

Observability & load test.

Traces, logs, cost dashboards per tenant. Alerting wired. Load test to 5× expected peak. p95 stable.

OutDashboards · alerts · load report

W11

IP transfer & customer-zero launch.

Full IP transfer. 23-page runbook handed to their on-call. Customer-zero traffic enabled. 30-day support clock starts.

OutRunbook · IP transfer · launch

§03 Outcomes.
Numbers, attached.

A working system, not a slide deck.

Metrics on the day we handed off, and what they look like twelve months later. References available on request.

uptime · 12mo

99.94%

Since hand-off, twelve months running, on their on-call.

p95 latency

181ms

/chat route. Stable across model swaps, well under the 500ms SLO.

cost / 1k req

−42%

vs. all-OpenAI baseline, after LiteLLM routing went live.

on-call burden

~0/wk

Their engineers operate from the runbook. We've answered fewer than three follow-ups in twelve months.

§04 The stack.
What's in production today.

AWS-native. Boringly chosen.

No exotic infrastructure. Every choice is one that a customer's own on-call team can operate without us.

Auth & identity

Clerk
JWT · OIDC
Tenant RLS
Cognito as fallback

API & compute

API Gateway · 66 routes
Lambda (Node 20)
Step Functions
SQS · EventBridge

Models & retrieval

Bedrock (primary)
LiteLLM · 5 providers
pgvector + BM25
vLLM fallback

Ops & security

OpenTelemetry traces
Per-tenant cost dashboards
WAF · GuardDuty
Runbook (23pp)

Other engagements

case-02 · federal · 24 mo

Federal contract intelligence platform.

SAM.gov ingestion · MongoDB buyer maps · automated proposal drafting.

read case-02 →

case-03 · commercial · ongoing

24/7 algorithmic trading inference.

GPU-shared inference · MongoDB time-series · risk circuit breakers.

read case-03 →

case-04 · commercial · 8 wks

Computer-vision surveillance pipeline.

OSNet / InsightFace re-ID across 8 RTSP streams · edge GPU.

read case-04 →

§ Next step

Not sure which of these is you?

Tell us what's broken in a paragraph and a principal reads it directly — or walk the ladder from a low-commitment first engagement up to retained work.

Tell us what's broken → See how engagements ladder

PrivateStack — 0 → production in 11 weeks.

From a Notion doc to customer-zero in eleven weeks.

Every box has a runbook.
Every arrow has an ADR.

No surprises after week one.

Scope.

Architecture & threat model.

Auth + tenant schema.

Build out the 66 endpoints.

Retrieval + eval harness.

Billing, usage, admin console.

Security review & red-team.

Observability & load test.

IP transfer & customer-zero launch.

A working system, not a slide deck.

AWS-native. Boringly chosen.

Auth & identity

API & compute

Models & retrieval

Ops & security

Got something that looks like this?

Other engagements

Federal contract intelligence platform.

24/7 algorithmic trading inference.

Computer-vision surveillance pipeline.

Not sure which of these is you?

PrivateStack — 0 → production in 11 weeks.

From a Notion doc to customer-zero in eleven weeks.

Every box has a runbook.Every arrow has an ADR.

No surprises after week one.

Scope.

Architecture & threat model.

Auth + tenant schema.

Build out the 66 endpoints.

Retrieval + eval harness.

Billing, usage, admin console.

Security review & red-team.

Observability & load test.

IP transfer & customer-zero launch.

A working system, not a slide deck.

AWS-native. Boringly chosen.

Auth & identity

API & compute

Models & retrieval

Ops & security

Got something that looks like this?

Other engagements

Federal contract intelligence platform.

24/7 algorithmic trading inference.

Computer-vision surveillance pipeline.

Not sure which of these is you?

Every box has a runbook.
Every arrow has an ADR.