A private, multi-tenant LLM platform for regulated industries — auth, billing, retrieval, routing across five providers, observability — built end-to-end and handed off with a runbook a third party could operate.
A regulated-industry B2B SaaS needed a private, multi-tenant LLM platform their security team could approve. They'd tried two earlier engagements with body-shop consultancies. One delivered a deck. The other delivered a Streamlit demo.
We started from the brief, not from a template. Week one was a threat model and architecture review. Week eleven was customer-zero in production. The handoff included a 23-page runbook, full IP transfer, and a 30-day post-launch support window.
The system has run for twelve months since hand-off without our intervention. Their on-call team operates it from the runbook we left.
AWS-native by default. Bedrock as the primary inference path, LiteLLM as the routing layer, pgvector for retrieval. Per-tenant cost ceilings and observability throughout.
Fixed-scope, written decision log on every call, weekly demo on Friday. The scope doc we signed in week zero matches the artifacts handed off in week eleven.
Discovery call. Written scope, deliverables, milestones, fee. 48-hour fixed-fee quote.
System diagram, data-flow, ADR-001 (routing strategy), threat model. End-of-week demo of the scaffolding.
Clerk JWT integration, tenant table, RLS policies, API skeleton with three live routes. CI green.
Inference routes, admin routes, embedding routes. LiteLLM router with five providers, cost ceilings per tenant, fallback strategy.
pgvector + BM25 hybrid retrieval. Eval harness with 842 golden cases. Drift baseline established.
Stripe billing, per-tenant usage metering, admin console for support. PII scrubbing wired through every route.
IAM hardening, secrets rotation, prompt-injection red-team, fixes for 8 findings. Bedrock IAM clean.
Traces, logs, cost dashboards per tenant. Alerting wired. Load test to 5× expected peak. p95 stable.
Full IP transfer. 23-page runbook handed to their on-call. Customer-zero traffic enabled. 30-day support clock starts.
We expected a deck and got a deploy.
The runbook outlived three of our engineers. — CTO, anonymized · Series-B SaaS · reference on request
Metrics on the day we handed off, and what they look like twelve months later. References available on request.
No exotic infrastructure. Every choice is one that a customer's own on-call team can operate without us.
We'll tell you in 48 hours whether it's a fit, scope it if it is, and refer you elsewhere if it isn't. A principal reads every message.
SAM.gov ingestion · MongoDB buyer maps · automated proposal drafting.
GPU-shared inference · MongoDB time-series · risk circuit breakers.
OSNet / InsightFace re-ID across 8 RTSP streams · edge GPU.