By Alan Leard at Evolve Advising
There's a quiet shift happening in how software gets built. For decades, the process has looked roughly the same: a business user has an idea, writes it up (or tries to), hands it to a product manager, who translates it for engineers, who then spend weeks building something that may or may not match what was originally envisioned.
AI is collapsing that chain. Today it's possible to design a pipeline where a business user — someone with no coding background — initiates a development request and AI carries it all the way to a completed pull request, ready for an engineer to review and merge into production. The engineering team doesn't touch it until the code is written, tested, and waiting for their approval.
This post walks through how to set that up.
The Core Idea
The traditional handoff between business and engineering is where most projects lose time, context, and fidelity. Requirements get misunderstood. Priorities get reshuffled. Simple requests sit in backlogs for months.
An AI-driven pipeline eliminates those handoffs by letting business users describe what they need in plain language and having AI agents handle the translation into working code. Engineers shift from builders to reviewers — a role that's arguably more valuable and far more efficient.
One important distinction: this pipeline is about build-time AI — AI agents that write, test, and review code during the development process. It's separate from runtime AI, where a product uses AI features in production (chatbots, recommendation engines, intelligent search). Build-time AI is how the software gets made. Runtime AI is what the software does. This pipeline applies whether you're building an AI-powered product or a traditional CRUD application.
The Pipeline, Step by Step
Step 1: Structured Intake
Every pipeline starts with a front door. Business users need a clear, simple way to submit what they want — and the system needs to capture enough context for AI to act on it.
This means building an intake form or conversational interface that collects the essentials: what the user wants to accomplish, who it's for, what success looks like, and any constraints (timeline, systems involved, compliance requirements). The key is to keep it in business language. No technical jargon, no implementation details.
The experience should feel seamless — a browser extension that lets the business user highlight an element on screen and describe what they want changed, a Slack workflow, a form in Jira or Linear. The less it feels like "submitting a ticket" and the more it feels like pointing at something and saying "fix this," the more adoption you'll get. The important thing is consistency — every request follows the same format so the downstream AI agents can reliably interpret it.
Step 2: AI-Powered Requirements Refinement
Raw business requests are rarely complete enough to code against. This is where an AI requirements agent steps in. Think of it as an automated business analyst.
The agent reviews the intake submission and asks follow-up questions: "When you say 'update the dashboard,' which metrics should change?" or "Should this apply to all users or just admins?" This back-and-forth happens in natural language, directly with the business user, until the requirements are specific enough to act on.
The output is a structured requirements document — written in plain language but detailed enough that a coding agent (or a human engineer) could build from it without ambiguity.
Step 3: Interactive Prototyping
From conversations with CEOs who want to participate directly in development, this is often the most important step.
Before backend development begins, an AI prototyping agent builds the new feature directly into the existing application using the company's actual UI framework — React, Vue, Angular, Blazor, or whatever the team uses. This isn't a static wireframe or a standalone mockup. It's the real application with the new feature wired in, using mocked data to simulate backend responses. Dropdowns populate. Tables fill with realistic sample data. Filters work. Forms submit and show confirmation states — all within the context of the product the business user already knows.
Business leaders don't give their best feedback on static designs. They give their best feedback when they can interact with something. "Move the filter to the top" becomes obvious when you're trying to use it and the filter is buried. The feedback is grounded in experience, not imagination.
Building in the actual framework also means this code isn't throwaway — it becomes the starting point for production development. Tools like Claude Code, Codex and Gemini running against the project's component library can generate these prototypes from the requirements document in minutes. This is also where the pipeline differentiates itself from tools like Lovable or Replit used in standalone mode. Those tools are great for quick proof-of-concepts, but as of this writing, Lovable projects ship with zero test automation, no security scanning, and no connection to your existing architecture. The pipeline takes that same rapid prototyping energy and channels it into a governed process.
The business user interacts with the prototype, requests changes, and the AI iterates until the experience feels right. Only then does the pipeline move forward.
In practice, many organizations find it valuable to maintain this as a permanent "two universes" model. The business user keeps a fast, ungoverned prototyping environment where they can experiment freely — trying ideas, exploring directions, moving at full speed with no guardrails. When they're ready to bring something into production, they carve off the piece they want and hand it to the governed pipeline. Think of it as the difference between wireframing and production implementation. The prototype universe is where ideas get validated. The production pipeline is where they get built properly.
Step 4: Architecture and Task Decomposition
Once requirements are locked and the prototype is approved, an architecture agent takes over. This agent understands the existing codebase, the tech stack, and the project's conventions. It breaks the requirement down into discrete development tasks: which files need to change, what new components are needed, what tests should be written, and what security or compliance considerations apply.
This step is critical because it's where institutional knowledge matters. The architecture agent needs access to the codebase, documentation, and a knowledge base of past decisions and patterns. Without this context, AI-generated code tends to drift from the project's conventions, creating review headaches later.
Step 5: Code Generation
With a clear task list in hand, a coding agent writes the actual code. Modern AI coding tools can generate functional code from well-defined specifications. In more sophisticated setups, this isn't a single agent working alone. An orchestrating agent can delegate sub-tasks to specialized sub-agents — one handling front-end components, another writing API logic, another generating database migrations — coordinating them as a team. These agents leverage skills (reusable instruction sets tuned to your project's patterns), hooks (automated triggers that enforce standards at each step), and tool calling (the ability to interact with external services, APIs, and development tools directly) to produce code that fits your codebase rather than drifting from it.
The key is scope control. Each task should be small and well-defined. "Add a date filter to the transactions API endpoint" is a good task. "Redesign the reporting module" is not. The quality of the generated code depends heavily on how well the decomposition in Step 4 was done.
Step 6: Local Testing and Self-Correction
This is the agent's inner development loop — the same cycle a human developer runs at their desk before pushing code.
The coding agent runs the project's existing test suite and generates new tests specific to the change — unit tests, integration tests, and regression tests. A security scanning layer runs in parallel: static application security testing (SAST), dependency scanning for known vulnerabilities, and secrets detection to catch accidentally committed credentials. Static analysis, linting, and code style enforcement also run here. None of these checks are advisory — they're all blocking. A failed security scan stops the loop just as firmly as a failed unit test. There is no "we'll fix it later" path. This rigor also pays commercial dividends — organizations pursuing SOC 2, HIPAA, or other compliance certifications find that the pipeline's enforced security scanning and audit trails turn compliance from a painful retrofit into a natural byproduct of how software already gets built. For companies approaching enterprise customers or preparing for due diligence, that's a sales enabler.
When tests or scans fail — and they will — the agent doesn't stop and flag a human. It attempts to fix the issue, re-runs the full suite, and tries again. This loop of generate, test, fix is where the pipeline gains its real efficiency. Set a reasonable limit on retries (three to five attempts) before escalating. If the agent can't resolve it, it documents what it tried and why it failed, so the eventual human reviewer has full context.
In practice, the rigor of these gates should vary by environment. In a development branch, you run minimal checks to keep iteration fast — the business user and the AI need a tight feedback loop. In a test environment, architecture validation kicks in. In staging, full security scanning and comprehensive test suites run. By the time code is heading toward production, every gate is enforced. This layered approach gives the pipeline speed where it matters and rigor where it counts.
Step 7: AI Code Review
Before the code leaves the agent's local environment, an AI review agent evaluates it the way a senior engineer would — checking architectural consistency, maintainability, error handling, and efficiency. This isn't a duplicate of the linting and static analysis from Step 6. It's a higher-level review: does this code fit the project's architecture? Are there better patterns available? Will this be maintainable six months from now?
The agent produces comments tied to specific lines of code, categorized by severity. Blocking issues get sent back to the coding agent for correction before anything is pushed. Suggestions and observations get carried forward into the eventual pull request for the human reviewer. Catching architectural problems here — before a preview is deployed or CI/CD resources are spent — saves significant time downstream.
Step 8: Documentation Generation
The pipeline generates documentation automatically as part of every change, in two categories.
First, developer and AI-developer documentation: inline code comments, API docs, architectural decision records, and updated READMEs. This category matters more than most teams realize. Future pipeline runs will involve AI agents working on the same codebase, and those agents rely on documentation to understand context. Well-documented code produces better AI-generated code downstream. Poorly documented code leads to compounding drift over time.
Second, user-facing documentation: help content, release notes, or knowledge base articles written in plain language. The AI has everything it needs — the original request, the refined requirements, and the code itself. Screenshots and interactive references get added once the preview environment is live in a later step.
The pipeline treats documentation updates as a blocking requirement — just like tests and security scans. Documentation is part of the deliverable, not an afterthought.
Step 9: Pull Request Creation
Once the code passes local tests, AI code review, and documentation requirements, the pipeline creates a pull request — not a bare-bones one, but a thorough package that includes: a description of what changed and why, a link to the original business request, the AI code review results, test and security scan results, links to new documentation, and a summary of decisions the AI made along the way.
For the business user, this step should be invisible. They don't need to understand pull requests, branching strategies, or Git workflows. They type "submit work" and the pipeline handles the rest — creating the branch, organizing the commits, generating the PR description, and routing it to the right reviewers. The complexity lives in the pipeline. The business user's experience stays simple.
Step 10: CI/CD Pipeline Validation
Creating the pull request triggers the project's CI/CD pipeline — GitHub Actions, GitLab CI, Azure DevOps, or whatever the team uses. This is a separate layer of validation from the local testing in Step 6, and it catches a different class of problems.
The CI/CD pipeline runs in a clean, reproducible environment that mirrors production. It catches issues that can hide in local development: dependency resolution problems, environment-specific configuration failures, build errors that only surface in a fresh checkout, and integration issues with the target branch that didn't exist when the feature branch was created. It also runs the full project test suite — not just the tests related to the current change — to catch regressions across the entire codebase.
This is where branch protection rules and merge requirements are enforced at the platform level. The CI/CD pipeline should be configured to require passing status checks before a PR can be merged: all tests green, security scans clean, build successful. These are non-negotiable gates that exist independently of the AI pipeline — they're the same gates any human developer's PR would need to pass.
When CI/CD checks fail — and they will, especially in active codebases where the target branch has moved since the feature was started — the AI coding agent should pick up the failure, diagnose the issue, fix it, and push an update to the PR. Merge conflicts, failing integration tests against new code in the target branch, or build configuration issues all get resolved automatically. The human reviewer should never open a PR that has red checks.
Step 11: Preview Environment Deployment
As part of the CI/CD pipeline, a live preview environment is deployed automatically — a fully working instance of the application with the new feature included. This isn't a developer's localhost. It's a shareable URL that anyone in the organization can visit.
The business user who initiated the request can now share a working link with their team, their manager, or other stakeholders and say "try this." A VP can pull it up on their phone during lunch. A support lead can test it against a real workflow. Platforms like Vercel, Netlify, and most modern CI/CD systems support automatic preview deployments tied to pull requests. User-facing documentation generated in Step 8 gets updated here with screenshots and links from the live preview.
Step 12: AI Exploratory Testing
Automated test suites catch the problems you anticipated. Exploratory testing catches the ones you didn't.
This step uses an AI testing agent with browser automation — via Chrome DevTools Protocol, browser extensions, or tools like Playwright — to interact with the preview environment the way a real user would. The agent clicks buttons, fills forms, tests edge cases, and exercises user flows. It's looking for things unit tests miss: a button that doesn't respond, a layout that breaks at certain screen sizes, an error state that shows a raw stack trace, a form that accepts input it shouldn't.
The agent also validates against the original requirements and the approved prototype from Step 3 — closing the loop between what was promised and what was built. Because it has access to browser developer tools, it monitors for console errors, failed network requests, performance issues, and accessibility violations. Issues get fed back to the coding agent for correction, which triggers another pass through CI/CD before the PR is ready for human eyes.
Step 13: Human Review and Merge
This is where the engineering team enters the picture. They review the PR the same way they'd review any other — checking for code quality, architectural fit, edge cases, and maintainability.
The difference is the baseline. The code has already passed local automated tests, AI code review, the full CI/CD pipeline, and AI exploratory testing against a live preview. The requirements are clear. A preview environment is running. All checks are green. The review becomes a final quality gate focused on the judgment calls that still require human expertise — subtle architectural decisions, long-term maintainability concerns, and business logic nuances that automated systems can't fully evaluate.
Configuring the AI Agents
The pipeline only works if the AI agents are properly configured. An out-of-the-box AI coding tool dropped into a repo with no context will produce mediocre results. The configuration layer is the engineering team's most important contribution — and most of it happens before a single business request ever comes in.
Project Identity: The Claude.md File
Every AI-assisted project should have a Claude.md file (or equivalent) at its root — the agent's onboarding document. It describes the project's purpose, tech stack, architectural patterns, naming conventions, testing philosophy, and non-negotiable rules.
This file should be refined continuously. If the agent keeps generating components with inline styles when the project uses CSS modules, that's a Claude.md refinement. Over time, it becomes a living document that encodes institutional knowledge in a form AI can consume — and its quality directly determines the pipeline's output quality.
Codebase Indexing
AI agents can't write code that fits your project if they can't efficiently navigate it. Source code indexing tools like Chunkhound scan the codebase and build a structured index that helps agents locate relevant files, understand module relationships, and identify patterns to follow. Without indexing, the agent searches your entire repository every time — slow, expensive, and prone to missing context. For larger codebases, this is the difference between the pipeline being practical and unusable.
Task Tracking and Context
In a multi-step pipeline, each agent needs to understand what's been done, what's in progress, and what comes next. Task tracking systems like Beads maintain a structured record of work as it flows through the pipeline, ensuring the coding agent has access to the full chain: the original request, refined requirements, approved prototype, architectural plan, and specific sub-task. Without this continuity, context gets lost between stages.
Skills: Reusable Agent Capabilities
Skills are packaged instruction sets that extend what agents can do — a skill for generating React components according to your design system, a skill for writing API endpoints that follow your patterns, a skill for creating migrations with your ORM conventions. The value compounds over time. When a team solves a recurring problem once and encodes it as a skill, every future pipeline run benefits.
Hooks: Pipeline Enforcement
Hooks are automated triggers that fire at specific points in the agent's workflow — before a file is saved, after code is generated, when a task completes. A pre-commit hook runs the security scanner automatically. A post-generation hook verifies naming conventions. A task-completion hook triggers the next pipeline stage. Hooks are what turn the pipeline from a series of suggestions into an enforced process.
Putting It Together
The engineering team's role isn't to write code — it's to create the environment in which AI agents can write good code. Maintaining the Claude.md, keeping the index fresh, building skills, and wiring hooks. Instead of implementing features, senior engineers are curating the knowledge and guardrails that make every pipeline run reliable. It's higher-leverage work — the time invested pays off across every feature the pipeline produces.
How This Aligns With a Healthy SDLC
A reasonable reaction to this pipeline is skepticism: it sounds like it's bypassing the engineering discipline that mature teams have spent years building. It isn't. Every step maps directly to established SDLC and Scrum best practices. The pipeline doesn't skip them — it enforces them more consistently than most teams manage on their own.
Definition of Ready — Enforced, Not Aspirational
In most Scrum teams, the Definition of Ready is aspirational. Stories enter sprints half-baked because there's pressure to keep moving. This pipeline makes it structural — work literally cannot advance to code generation until the criteria are met.
Steps 1 through 4 enforce user story clarity, acceptance criteria, business value documentation, stakeholder approval, and attached designs — not as checkboxes a product owner clicks, but as pipeline gates the work must pass through. The AI requirements agent won't release a story until it's unambiguous. The architecture agent won't decompose work until technical feasibility is assessed and dependencies are identified. The prototype won't advance until the business user has interacted with it and approved the experience.
Every DoR criterion a healthy Scrum team would expect — clear stories, specified acceptance criteria, identified dependencies, attached mockups, feasibility assessment, security requirements, scoping, testing plans, and documentation needs — is addressed by a specific pipeline step before a single line of production code is written.
Definition of Done — Blocking, Not Advisory
The Definition of Done is where most teams have the biggest gap between aspiration and reality. Under deadline pressure, items get marked "done" with incomplete testing, missing documentation, or skipped security reviews. The pipeline eliminates this gap because every criterion is a blocking gate — or more precisely, every criterion becomes an AI agent. Uses components from your UI library? There's a UI library enforcement agent. Follows the project's API conventions? There's an architecture review agent. Has adequate test coverage? There's a testing agent that won't let the code advance until coverage thresholds are met. The Definition of Done stops being a checklist that humans sometimes skip and becomes a set of automated agents that can't be bypassed.
Specification alignment is verified by the AI code review and exploratory testing against the original requirements. Code review happens in two layers — AI and then human. Unit, integration, and performance testing are all enforced in the automated testing gate. Security validation runs on every change, not just the ones someone remembered to flag. The preview environment proves deployment readiness. The exploratory testing agent checks accessibility against WCAG standards. Documentation generation is a blocking step. The AI code review verifies that monitoring, logging, and analytics instrumentation are in place.
The criteria that teams typically struggle to enforce consistently — rollback plans, customer support documentation, business metrics instrumentation, compliance reviews — get either automated directly or flagged as required sign-offs before merge. The pipeline produces the rollback plan as part of the PR. It generates the support knowledge base articles. It verifies analytics are present. For items requiring human judgment (legal review, compliance sign-off), it flags the PR and won't proceed without confirmation.
Scrum Ceremonies, Accelerated
The pipeline maps cleanly to Scrum's core ceremonies. Sprint Planning corresponds to the architecture decomposition in Step 4. Daily Standup visibility comes from the task tracking layer, which shows where each request sits in the pipeline in real time. Sprint Review happens twice — once during prototype approval and again with the deployed preview environment — giving stakeholders two feedback loops instead of the traditional one. Sprint Retrospective maps to the ongoing refinement of agent configuration: when the team spots a recurring issue, they update the Claude.md, add a skill, or wire a new hook, and the improvement takes effect immediately across all future runs.
Beyond Standard Scrum
The pipeline addresses three persistent Scrum pain points. The handoff gap between business and engineering — eliminated, because business users interact directly with the pipeline. Inconsistent Definition of Done enforcement — impossible to skip, because every gate is blocking. The throughput bottleneck of limited engineering capacity — removed, because AI handles implementation while engineers focus on review and architecture.
One new concern the pipeline introduces is branch drift. When multiple business users are submitting requests in parallel, feature branches can diverge quickly, leading to merge conflicts. This is manageable with the same discipline any high-throughput team uses — small, well-scoped changes, frequent merges, and clear ownership of shared modules — but it's something to plan for from day one.
The engineering team's role shifts from executing sprint work to maintaining the system that executes sprint work. They're still accountable for quality, architecture, and technical direction. They're still the final reviewers. They're spending their time on the work that requires human judgment.
The Organizational Shift
This pipeline doesn't just change the technology. It changes roles. Business users become directly accountable for what gets built because they're defining requirements in real time, not tossing them over a wall. Engineers become reviewers and architects, focused on system integrity and strategic decisions. Product managers shift from translators to pipeline operators — tuning the system, improving intake, and monitoring quality.
It also changes the math on product validation. When the cost of building a feature drops to near zero, you stop debating whether to build it — you build it, ship it, and measure it. Analytics and observability become first-class pipeline outputs, not afterthoughts, because the bottleneck shifts from "can we build this?" to "did anyone use it?" The traditional fear of throwing away weeks of development work disappears when the pipeline can rebuild a feature in hours — as long as it's well documented. The 50-page product requirements document that used to take longer to read than to debate now takes longer to read than it takes for the pipeline to build the feature. That inversion changes how organizations think about iteration, experimentation, and risk.
At its most mature, the pipeline starts to look like self-improving software. A customer support representative submits a feature request. The pipeline turns it into a structured requirement, generates the code, tests it, deploys a preview, and produces a PR — all before the product owner finishes reading the original request. That's not science fiction. Teams are doing it today for well-scoped changes in well-configured codebases.
Expect internal resistance. Sales teams, support teams, and even some engineers will see the time spent building the pipeline as time not spent building features. They'll worry that development has stopped. The reality is the opposite — you're building the system that will produce features faster than any team could manually — but that argument only lands once people see the pipeline working. Start small, show results early, and let the output speak for itself.
One practical note: the pipeline's own tooling can help with its own adoption. The same skills system that powers code generation can be used to build onboarding skills — automated walkthroughs that set up a new user's environment, clone the right repos, install dependencies, and configure credentials. If the pipeline can onboard a new developer in minutes instead of days, that's a compelling early proof point.
It's a meaningful change, and it won't happen overnight. But the technology to support it exists today. The organizations that figure out the process and culture around it will have a significant advantage in speed, cost, and alignment between what the business wants and what gets built.
Evolve Advising helps organizations design and implement AI-driven workflows that bridge the gap between business strategy and technical execution.











