February 10, 2026
10 mins

Replacing Incumbent Systems With AI: The Gap Between Demo and Done

The gap between demo and done when replacing systems with AI
Image
Image

There’s a seductive idea gaining traction in boardrooms: that replacing legacy software is now as simple as subscribing to an AI coding tool and prompting your way to a new system. A weekend. Maybe a week. Show the CEO a working demo and declare victory.

It isn’t that simple. And the organisations that assume it is are building the next generation of legacy problems while congratulating themselves on innovation.

Replacing an incumbent system with an AI-built alternative is closer to a structured re-engineering programme than a coding exercise. The organisations doing this successfully treat AI as a high-speed engineering partner operating inside a disciplined framework of architecture, security, and operations. Not as a shortcut that eliminates the need for engineering judgement.

The difference between these two approaches is the difference between a system that works in a demo and a system that works in production, at scale, under load, with real users doing unexpected things.

Start With a Proof of Concept — But Be Ruthlessly Honest About What It Proves

Before embarking on a full replacement, the first step should almost always be a controlled proof of concept. Not because the technology needs proving — AI can clearly build software — but because the organisation needs proving. Stakeholders need to see it. Sceptics need to touch it. Budget holders need to believe it.

The POC secures confidence, budget, and organisational alignment without prematurely committing to architecture or operational risk. Consider the audience. The system you’re proposing to replace may have taken two years to build and cost half a million pounds. The people who commissioned it, managed it, and lived with its limitations have a deeply embedded mental model of how long software takes. Telling them AI can replace it in weeks or months — when their lived experience measures these things in years — simply won’t be believed. And it shouldn’t be, without evidence. A working POC converts an abstract claim into something tangible.

It shifts the conversation from “that sounds too good to be true” to “I can see how this works.”

A strong POC demonstrates five things: that AI can fully understand the incumbent system by ingesting its documentation, UI flows, API traffic, and support history; that it can recreate core workflows credibly; that it can improve on known pain points rather than just reproducing them; that it generates structured engineering output — not just code, but requirements, architecture drafts, and test plans; and critically, that there’s a clear list of what the POC deliberately excludes.

That last point is where most organisations get into trouble. Without explicit boundaries, stakeholders see a working demo and assume the system is essentially built. “We’ve basically done it” becomes the narrative. Security hardening, compliance controls, performance engineering, observability, full testing — all of this gets mentally filed as “minor follow-up work” when it’s actually the majority of the effort.

The POC proves the future is possible. It is not the future system itself. That distinction must be explicit from day one, repeated often, and documented in the deliverables. A POC without boundaries is a prototype masquerading as a product.

The Legal Question Nobody Wants to Ask First

Before any technical work begins, establish whether you’re even allowed to replace what exists.

This sounds obvious. It almost never happens early enough.

AI can rapidly reproduce functionality — including proprietary workflows, business logic, and interface patterns. That capability creates legal exposure if the incumbent system is protected by IP, licensing restrictions, non-compete clauses, or data ownership terms buried in supplier contracts.

The questions that matter: Who owns the current system’s IP? Are you licensed to replicate its functionality? Does data extracted from the system belong to you, or to the vendor? Are there API usage constraints that prevent you from reverse-engineering integrations?

The correct mindset is clean-room replacement: replicate outcomes, not protected implementation. Build something that achieves the same business results through independently designed architecture.

This is a legal distinction, not a technical one, and it needs legal review before engineering begins.

Getting this wrong doesn’t just create risk. It can kill the entire programme after significant investment.

Understanding the System You’re Replacing

Here’s where the real work begins, and where AI earns its place as something more than a code generator.

Before building anything, build a complete operational model of the incumbent system. Modern AI tools allow you to construct this by ingesting UI screenshots, API traffic, documentation, support tickets, user complaints, database exports, and browser network traces. Feed it everything.

The goal is a machine-readable system specification.

Not just what the system does, but when it fails, where users struggle, what manual workarounds exist, and what the documentation says versus what actually happens.

This step is one of the highest-ROI uses of AI in the entire programme. A human doing this work manually would spend weeks interviewing stakeholders, reading documentation, and mapping workflows. AI can produce a first draft in hours, which humans then validate and refine.

But the step most teams skip — and the one that causes the most production failures — is mapping every integration and dependency. Most systems fail to be replaced not because the core functionality is wrong, but because of hidden integrations. Third-party APIs, authentication flows, webhooks, scheduled jobs, reporting pipelines, email services, file storage, analytics — each one with its own authentication method, rate limits, failure modes, retry logic, and compliance implications.

If this inventory is incomplete, your replacement will fail in production. Not might. Will.

Equally important: study user dissatisfaction. A replacement that reproduces known problems is a missed opportunity. Feed AI the support tickets, feature requests, NPS feedback, and Slack complaints. Ask it to produce a usability failure map — the most hated features, the most error-prone workflows, the hidden manual processes that exist because the software doesn’t work properly. This turns a risky like-for-like rebuild into replacement plus improvement, which is a fundamentally easier sell to stakeholders.

Architecture Is a Human Decision

Only once the system is fully understood should architecture begin. And architecture is where human judgement matters most.

Do not let the AI decide as it goes.

This is the system replacement equivalent of vibe coding, and it produces the same results: something that works until it doesn’t, with no coherent structure to debug or extend.

Define the core architecture deliberately. Monolith versus modular versus services. Data model ownership. Integration patterns — event-driven versus synchronous. Authentication model. Multi-tenancy. Hosting model, including region and compliance requirements. Observability stack. AI can propose options and articulate trade-offs — but humans select direction. The AI is the engineering engine. It is not the architect.

And critically: create production-grade environments from day one. Dev, staging, production. CI/CD pipelines. Secrets management. Logging and monitoring. Backup strategy. AI can generate all of this infrastructure code — Docker, Terraform, CI pipelines, deployment scripts — but the structure must be defined first. Avoid the “prototype then rebuild” trap where teams build everything in a single environment and then spend months untangling it into something deployable.

Security Is Not a Phase — It’s a Foundation

This is where most AI-built replacements fail. Not because the code is insecure, but because security is treated as a later concern rather than a foundational one.

From the first commit, implement static analysis, dependency scanning, secret scanning, container scanning, and infrastructure-as-code scanning. Make security continuous, not post-build.

Make security continuous, not post-build.

CI pipelines should block on security findings. Vulnerabilities should be visible in a dashboard, not discovered in a penetration test six months later.

If you’re replacing a regulated system — finance, health, insurance, legal — data classification, encryption, access controls, audit logging, retention policies, and incident response plans are non-optional. They’re not “hardening” to be done before launch. They’re architectural decisions that shape the entire build.

The good news is that AI makes this dramatically easier. We recently set up six layers of continuous security scanning in a single working session. The tools are free. The configuration is automatable. The excuses for shipping without security scanning have evaporated.

The Build Framework: Structured AI, Not Vibe Coding

AI must operate within defined guardrails. The difference between a successful AI-built replacement and a fragile mess is not the AI’s capability — it’s the framework it operates inside.

Provide the AI with architecture documents, coding standards, security requirements, testing requirements, dependency policies, deployment patterns, and documentation standards. This becomes the AI’s engineering handbook. Without it, output quality degrades quickly. The AI starts making locally reasonable decisions that are globally inconsistent. Different modules follow different patterns. Authentication works three different ways. Error handling is improvised per feature.

As features are built, generate design documents, record architectural decisions, capture integration contracts, and document data models. Treat documentation as a first-class artifact, not an afterthought.

This isn’t bureaucracy — it’s the mechanism that allows future AI agents (and humans) to understand system intent, maintain coherence, and reduce knowledge loss.

Every feature should have a requirements spec, a design document, a build plan, tests, a security review, and documentation. AI executes. Humans review. This is structured engineering at AI speed, not AI replacing engineering.

Testing Is Non-Negotiable

AI-built systems must be heavily tested — arguably more heavily than human-built ones, because the speed of generation means the speed of introducing subtle inconsistencies.

Unit tests, integration tests, API contract tests, end-to-end tests, load tests, security tests. AI can generate test suites, create synthetic data, simulate edge cases, analyse failures, and expand coverage. But humans define acceptance criteria.

The AI doesn’t know what “correct” means for your business — it only knows what you’ve told it.

The Parallel Run: Where Hidden Logic Surfaces

Before switching off the incumbent system, run both in parallel. Compare outputs. Validate edge cases. Monitor performance. Confirm data integrity. Test failure scenarios.

This stage almost always reveals hidden logic — business rules that exist in production but weren’t documented, edge cases that only trigger on the third Tuesday of months with 31 days, integrations that behave differently under load than in testing.

The parallel run is not optional validation. It’s the stage that converts confidence into certainty. Skip it and you’re gambling that your understanding of the incumbent system was complete. It wasn’t. It never is.

Operating What You’ve Built

Replacing a system means operating one. This is obvious but frequently underestimated.

Structured logging, metrics, alerts, tracing, error tracking, usage analytics — all of this must exist before users arrive, not after the first incident. AI can analyse logs and suggest fixes, but instrumentation must be in place for there to be anything to analyse.

Post-launch, the work continues: continuous vulnerability scanning, dependency updates, penetration testing, backup validation, cost monitoring, performance tuning. AI assists with all of it. But governance — who decides what gets patched, what gets prioritised, what constitutes acceptable risk — that remains a human responsibility.

Governance — who decides what gets patched, what gets prioritised, what constitutes acceptable risk — remains a human responsibility.

The Real Lesson

Replacing an incumbent system with an AI-built alternative is now entirely viable. It’s often dramatically faster and cheaper than traditional rebuilds. But success depends far less on prompting ability and far more on legal clarity, system understanding, architecture discipline, security integration, and operational readiness.

The organisations that get this right follow a consistent sequence: proof of concept for credibility, full system understanding with AI assistance, deliberate architecture and planning, security and DevOps foundations, structured AI build, parallel run and replacement. Trying to jump straight to the build phase — which is what most teams want to do, because it feels like progress — usually fails.

AI accelerates execution. It does not remove the need for engineering judgement.

The teams that understand this distinction are building systems that will last. The teams that don’t are building the next generation of technical debt, faster than ever before.