Back to blog
AI-Assisted Development

How AI-Generated Code Exposes the Pipeline You Built for a Different Era

The real risk isn't speed. It's a pipeline designed for a time when engineers controlled how fast code arrived. AI-generated code exposes every assumption the old structure quietly made.

By , systems architect and founder of Curfee. Background spans logistics, software engineering, and architecture across automotive, public sector, and software consulting.

The real risk with AI-generated code is not speed. It is a pipeline designed for a different time – a time when engineers wrote the code, and pull request volume was bounded by how fast they could type, think, and context-switch.

That assumption no longer applies. Once AI generates code, commits arrive faster than the original structure can absorb. Throughput shifts. The review gate does not. In most organisations, either quality drops, morale drops, or both.

The pattern I see repeatedly: teams assume the bottleneck is behavioural. It is not. It is structural.

Why the Old Pipeline No Longer Fits

Most CI/CD pipelines running in production today share a common shape. They trust engineers to remember conventions. They rely on reviewers to catch violations. They treat CI as a safety net for test failures, not as a structural filter for architectural decisions. Most real enforcement happens in an engineer’s head, during a five-minute review window between other tasks.

That design worked because volume was bounded. A senior engineer could keep the system landscape in their head, remember the naming convention, verify that a new service respected the layering rules, and push back when something drifted. The pipeline did not need to enforce the structure because the engineers in front of it did.

Once AI-assisted code generation joins the loop, that balance collapses. A team shipping at four times the previous velocity is not doing four times the review. The same review capacity spreads across more commits – less time on each change. What used to be bounded by engineer typing speed is now bounded by prompt iteration speed, which has no upper limit.

The gap between what the pipeline tolerates and what the architecture requires defines the operating reality of the codebase. Whatever the pipeline lets through is the de facto standard, regardless of what the documentation says.

Why Tighter Review Shifts the Bottleneck

A common response to rising AI-generated contribution volume is to tighten the review gate. Add another approver. Require a staff engineer sign-off. Introduce a pre-review checklist. Extend the review SLA.

That works, but it shifts the bottleneck to the engineers.

Throughput drops because every pull request now waits longer for engineer attention. Technical debt still accumulates, only more slowly. The team feels the friction – reviewer fatigue, merge queues, senior engineers spending their days in review tools instead of designing systems. Morale suffers in a specific way: the engineers who were hired to architect become clerks for machine output.

The math is unchanged. More contributions hitting a fixed-capacity review gate produces a queue, not quality. The queue itself becomes the next bottleneck.

Adding more reviewers to a pipeline that was never designed to filter at this volume is like hiring more checkout clerks because the line is long. It moves the pressure point without removing it. The pipeline itself is the checkout design.

If that framing fits your current situation, a structural diagnosis is often the faster way to see where the pipeline is absorbing the shift and where it is breaking.

What AI-Generated Code Actually Exposes

An AI does not get tired. It does not forget the naming convention. It does not remember it either. It produces exactly what the system tolerates.

Engineers have internalised what “clean” looks like in this codebase, even when it is not written down. They self-correct. An AI does not self-correct against unwritten rules – it corrects against the signals the system gives back.

If the pipeline accepts a pull request with missing contract tests, the system has told the AI that contract tests are optional. If it merges code with untyped boundaries, untyped boundaries are acceptable. If an architectural layer is violated and nothing breaks, the layer is decorative.

At scale, that feedback loop defines the architecture. Not the diagrams. Not the decision records. The pipeline’s tolerance is the real architectural contract, and AI-generated code will find every edge of it, consistently, at volume.

The pipeline is the opening, not the AI. The AI produces against the signal the system provides. Change the signal and the output changes.

Encoding the Non-Negotiable Parts

When the pipeline has to enforce structural rules it was never designed to enforce, the shift is from review-driven enforcement to system-driven enforcement. Decisions that used to live in an engineer’s head get encoded precisely enough that a machine can check them.

The mechanisms are specific, and most have existed for years, mostly underused.

Architectural fitness functions encode structural rules as executable checks. A fitness function can verify that the domain layer never imports from infrastructure, that no service talks to another service’s database directly, that cyclic dependencies do not exist. Neal Ford’s work and the ThoughtWorks Technology Radar treat this as a core practice for evolutionary architecture, and InfoQ has documented the patterns for embedding them into CI.

Contract testing verifies that the agreed contract between two services still holds on every change. When a producer changes a schema without the consumer’s agreement, the pipeline fails. The decision – “services agree on schemas before changing them” – is enforced on every commit, not negotiated in Slack.

Coverage thresholds that block rather than warn turn a soft signal into a hard gate. A coverage drop below the threshold fails the build. Not a yellow warning. Not a comment in the review. A red pipeline.

Type checking at module boundaries catches contract violations before runtime. Dependency rules enforced via tools like ArchUnit, dependency-cruiser, or language-native linters block imports that cross layers they should not cross. Security scans fail the build when a known vulnerability enters a dependency. Performance budgets fail the build when a page-weight or latency threshold is exceeded.

The pattern is identical across all of them: the decision is recorded in executable form. The pipeline enforces it. Green or red. No engineer has to remember.

The Honest Part

Yes, some engineering tasks disappear.

Catching naming violations, verifying layer dependencies, reviewing boilerplate, checking that test files sit next to source files, confirming a new endpoint follows the REST conventions the team agreed on last year – the pipeline absorbs all of that. For engineers who spent most of their time there, the role changes significantly.

That change is not abstract. A reviewer whose value proposition was catching structural drift has less to do when the pipeline catches it first. A staff engineer whose unofficial job was being the keeper of architectural memory no longer has to keep it in their head, and no longer gets credit for doing so. The informal power that came from knowing the unwritten rules dissipates when those rules are written down as executable checks.

What remains, and what becomes more valuable, is judgment. Knowing which boundaries to draw and why. Understanding what the system should never tolerate, and encoding that precisely enough that a machine can enforce it. The framing of a new fitness function, the choice of which invariants to protect, the decision to raise or lower a coverage threshold based on risk – each is an architectural act.

The engineers who defined the guardrails are not replaced by the pipeline. They are the pipeline – their decisions, rendered executable.

The shift reshapes what seniority means. Less time on review queues. More time on what the system should enforce and why. That is a harder job, a more consequential one, and one most engineers were hired to do in the first place.

Why Speed Stops Being the Risk

Once the structure is encoded, speed stops being the risk.

AI coding tools can submit pull requests endlessly. Red pipeline: rejected, iterated. Green pipeline: merged. Engineer attention per commit drops because the structural checks happen before a reviewer ever opens the diff. Reviewers look at intent and design, not at whether the naming is right or the layer is respected.

The counter-intuitive part: AI-generated code gets faster once the structure is non-negotiable. No engineer bottleneck in front of a clean change. The AI keeps iterating until the system agrees. The iteration loop that used to take hours – submit, wait for review, receive comments, revise, wait again – shrinks to minutes because the feedback is automatic and immediate.

Teams that reach this state report a specific shift in how contributions feel. The anxiety around AI-generated code fades. Reviewer fatigue fades. What remains is a codebase that keeps its shape even when volume spikes, even when new commits – from engineers or from AI coding tools – arrive without institutional memory.

The underlying principle is direct. Structure that depends on engineers remembering will fail the moment attention shifts. Structure encoded in the pipeline gives feedback on its own, survives reorganisations, survives new sources of code, and makes the architecture findable and challengeable when it needs to change.

Where This Leaves the Review Gate

Review doesn’t disappear. It moves, and it changes.

It moves upstream: architectural decisions are made explicit before code is written – the fitness function, the contract, the dependency rule, the performance budget. That conversation happens once, at design time, with the people who should be having it. The result is an executable artefact, not a Confluence page.

It changes in focus: the review that remains is about intent and design, not structural compliance. A reviewer looks at a green pull request and asks whether the change makes sense, whether the abstraction is right, whether the approach aligns with where the system is going. Those are the questions senior engineers were always supposed to be asking. The pipeline clears the noise so the signal can land.

Organisations that make this shift find their review culture improves rather than degrades. Reviewers engage with meaningful changes. Engineers and AI coding tools alike receive immediate structural feedback rather than delayed reviewer feedback. The system landscape becomes explicit, version-controlled, and testable. What was tacit becomes visible. What was visible becomes challengeable.

Teams underestimate how much this changes governance. An implicit decision buried in collective memory cannot be challenged, because it cannot be found. The new engineer doesn’t know to question it. The consultant reviewing the architecture doesn’t see it. The team itself forgets who made it and why. An explicit decision encoded in a pipeline check is different: it can be read, questioned, refined, or removed when it becomes obsolete. Enforcement and findability are the same mechanism.

This is the work Curfee does with companies whose pipelines were built for a different time and whose commit volume has outgrown the original design – making visible where the pipeline is quietly enforcing architectural decisions and where it is not, then defining what the system should enforce so decisions stand without depending on any engineer paying attention.

When structure is the last line of defence, speed becomes a feature, not a risk.

References

  1. 01 Building Evolutionary Architectures https://martinfowler.com/articles/evo-arch-forward.html
  2. 02 Architectural Fitness Function https://www.thoughtworks.com/radar/techniques/architectural-fitness-function
  3. 03 Fitness Functions Architecture https://www.infoq.com/articles/fitness-functions-architecture/
  4. 04 Contract Test https://martinfowler.com/bliki/ContractTest.html