AI isn’t a feature. It’s a layer. And if your AI layer is welded to a single vendor, you haven’t built intelligence into your product — you’ve built a dependency.
The Prompt Is the Product
When an AI hiring decisioning platform ships twelve distinct AI features — evaluation framework generation, transcript processing, interview summarisation, an AI coaching assistant, resume parsing, inline validation, style suggestions, impact metric recommendations, a conversational Q&A system, and more — the natural instinct is to treat each one as a separate integration. A different model here. A bespoke API call there. Maybe a managed assistant service with its own state management and vector stores.
That instinct is wrong.
Every AI feature in a production application is, at its core, a prompt. A system message that tells the model what role to play, what context matters, and what output format to produce. The model is interchangeable. The prompt is the intellectual property. The moment you treat the prompt as disposable and the model as permanent, you’ve inverted the value hierarchy — and locked yourself into a vendor relationship that will cost you when pricing changes, rate limits tighten, or a better model launches on a competing platform.
Every AI feature in a production application is, at its core, a prompt. The model is interchangeable. The prompt is the intellectual property.
The Anchor Decision: Prompts in the Database
The single most important architectural decision for the AI layer was this: store every prompt in a database table, editable through an admin portal, routed through a model-agnostic gateway.
The table — llm_prompts — contains twelve rows. Each row has a slug, a system prompt, a model identifier, temperature and token settings, and metadata for version tracking. That’s it. Twelve rows that power twelve AI features across the entire platform.
This wasn’t cleverness for its own sake. It was a direct response to what we’d inherited: over 2,000 stateful assistant instances on a managed AI service, each with its own configuration, its own conversation thread, its own vector store. Debugging meant hunting through thousands of instances to find the right one. Updating a prompt meant deploying code. Testing a new model meant rewriting integration logic. The operational overhead was staggering — and invisible to the product owner, because it all lived behind API calls that “just worked” until they didn’t.
We inherited 2,000 stateful assistant instances. Debugging meant hunting through thousands of instances. Updating a prompt meant deploying code.
Three Layers, No Magic
The architecture is deliberately boring. Three layers, each with a single responsibility.
Layer 1: The database. The llm_prompts table stores every prompt, its model assignment, and its configuration parameters. A prompt is a first-class entity in the data model — versioned, auditable, editable. When someone asks “what does the transcript processing feature actually do?”, the answer is a database query, not a code archaeology expedition.
Layer 2: The LLM service. A single service class that accepts a prompt slug, user input, and optional context. It looks up the prompt configuration from the database, constructs the API call, routes it through a model-agnostic gateway, and returns the response. The service doesn’t know or care which model it’s calling. It builds a standard chat completion request — system message, user message, parameters — and sends it through the gateway. Every AI feature in the application calls this same service. No special cases. No feature-specific integration code.
Layer 3: The admin UI. A portal where the product owner can view every prompt, edit the system message, adjust parameters, test with sample inputs, and see usage statistics. No deployment required. No developer required. The person closest to the business logic — the product owner — has direct control over what the AI actually says and does.
This is the pattern that matters: database, service, admin UI. It’s not novel. It’s not exciting. It works.
Model-Agnostic Routing: Why It Matters
The platform routes all AI calls through OpenRouter with BYOK (Bring Your Own Key) Amazon Bedrock as the upstream provider. In practice, this means the application never calls a model provider directly. It calls a routing layer that can direct traffic to any supported model without changing application code.
Today, the twelve features use two models. Complex reasoning tasks — evaluation framework generation, interview summarisation, the coaching assistant — run on a capable reasoning model. Fast, frequent tasks — inline validation, style suggestions, resume parsing — run on a smaller, cheaper model optimised for speed.
Switching models is a database update. Not a code change. Not a deployment. A product owner can test whether a new model produces better evaluation frameworks by changing one field in the admin portal, running the test panel, and comparing outputs. If it’s better, it’s live. If it’s worse, revert the field. The entire experiment takes minutes.
This matters because the AI model landscape changes every quarter. New models launch. Pricing shifts. Capabilities improve. If your AI features are hardcoded to a specific provider’s SDK, every model change is a development project. If your AI features route through an agnostic gateway with prompt configuration in the database, every model change is a configuration update.
Replacing 2,000 Stateful Instances
The previous architecture used a managed assistants API — stateful conversation threads with per-application vector stores. Each application in the system spawned its own set of assistant instances, each maintaining conversation history and attached documents.
The problems were predictable. State management created complexity: what happens when a thread gets too long? When a vector store needs updating? When an assistant’s behaviour drifts because its conversation history includes outdated context? Every stateful instance was a potential debugging nightmare, and with 2,000 of them, nightmares were frequent.
The replacement was stateless chat completions. Every AI call is independent. The application constructs the full context for each request — system prompt from the database, relevant data from the application state, user input — and sends a self-contained request. No threads. No vector stores. No accumulated state that can drift or corrupt.
Stateless is simpler, cheaper, and more debuggable. When something goes wrong, you can reproduce the exact inputs and outputs. When you need to change behaviour, you change the prompt in the database and every subsequent call uses the new version. There’s no “but what about the existing threads?” problem because there are no existing threads.
The Admin Portal: Where Operations Meets Intelligence
The admin portal transforms AI from a development concern to an operations concern. The product owner can:
Edit prompts in real time. The transcript processing feature isn’t producing clean enough output? Edit the system prompt. Add a constraint. Adjust the output format. Test it. Ship it. No developer involvement. No deployment pipeline. No waiting for the next sprint.
Test before shipping. Every prompt has a test panel. Paste in sample input, select a model, run the prompt, review the output. Side-by-side comparison when evaluating prompt changes. The feedback loop is measured in seconds, not days.
Monitor usage and costs. A dashboard showing calls per feature, average latency, token consumption, error rates, and estimated cost. When the product owner asks “how much are we spending on AI?”, the answer is a glance at a dashboard, not a request to the finance team to parse vendor invoices.
Track versions. Every prompt edit is logged. If a change produces worse results, revert to the previous version. Audit trails for compliance. History for learning what works.
This is what “AI as infrastructure” looks like in practice. The AI features aren’t magic. They’re configuration — managed with the same tools and processes as any other operational concern.
The person closest to the business logic — the product owner — has direct control over what the AI actually says and does. No deployment required.
The Twelve Features
Each feature is a row in the database. Each row is a slug, a system prompt, and a model assignment. Here’s what they do:
1. Evaluation framework generation (comprehensive) — Creates detailed assessment criteria from role descriptions. Complex reasoning. Capable model.
2. Evaluation framework generation (condensed) — A shorter, focused variant for quick assessments. Same reasoning requirements, different output format.
3. Transcript processing — Cleans raw interview transcripts. Fast model, high volume.
4. Interview summarisation — Produces structured summaries from cleaned transcripts. Capable model.
5. Conversational Q&A (application context) — An AI assistant that answers questions about a specific application’s data. Capable model.
6. Conversational Q&A (interview context) — Same pattern, different context: answers questions about interview content. Capable model.
7. Resume parsing — Extracts structured data from uploaded resumes. Fast model, high volume.
8. Inline validation — Real-time validation of user input against business rules. Fast model, low latency critical.
9. Style suggestions — Recommends descriptive language for evaluation criteria. Fast model.
10. Impact metric suggestions — Suggests measurable outcomes for role requirements. Fast model.
11. AI coaching — Provides real-time guidance during interviews. Capable model, latency-sensitive.
12. Completeness signal — Determines whether sufficient information has been gathered. Fast model.
Twelve features. Twelve database rows. One service class. One admin portal. Zero vendor lock-in.
Cost Visibility Changes Behaviour
When AI usage is invisible — buried in platform fees or lumped into a single vendor invoice — teams don’t optimise. They can’t. You can’t improve what you can’t measure.
The usage dashboard changed how the product owner thinks about AI features. When you can see that the conversational Q&A assistant costs four times more per call than inline validation, you ask better questions: Is the expensive model necessary for this feature? Would a smaller model produce acceptable results at a quarter of the cost? Are there features being called more often than expected — and if so, is that a sign of value or a sign of a UX problem that’s forcing users to retry?
These are product decisions, not engineering decisions. And they should be made by the product owner, not the development team. The dashboard makes that possible.
What This Pattern Prevents
Vendor lock-in. If your model provider doubles their pricing tomorrow, you change the routing configuration. You don’t rewrite twelve integrations.
Prompt drift. When prompts live in code, they accumulate undocumented changes across branches and deployments. When prompts live in the database with version tracking, every change is deliberate and reversible.
Developer bottlenecks. The product owner doesn’t need to file a ticket to adjust how the AI coaching feature responds. They edit it themselves. The people with domain expertise have direct access to the AI configuration.
Invisible costs. Every AI call is logged, measured, and attributed. There are no surprise invoices. No “we’re spending how much on AI?” moments.
AI Isn’t a Feature. It’s a Layer.
The mistake most teams make with AI is treating it as a feature: “We added AI to our product.” The framing is wrong. You didn’t add AI. You added twelve specific capabilities that happen to use language models as their execution engine. The prompts encode your business logic. The models are commodity infrastructure.
When you build AI as a layer — database, service, admin UI — you get operational control over your intelligence. You can measure it, tune it, swap it, and explain it. When you build AI as a feature — hardcoded prompts, vendor-specific SDKs, no admin tooling — you get a dependency that looks clever until the vendor changes their terms.
The requirements discipline applies here as much as anywhere. Each of those twelve prompts was specified before it was built. The model assignments were deliberate choices, not defaults. The admin portal was a day-one requirement, not a “nice to have.” Every decision was explicit, documented, and reversible.
Twelve features. Twelve database rows. One service class. One admin portal. Zero vendor lock-in.
AI isn’t magic. It’s infrastructure. Build it like infrastructure, and it will serve you for years. Build it like magic, and it will surprise you — rarely in the way you want.