10.16 Staged Pipeline with Per-Stage Feedback — AI-Native Organization Patterns

Sequential pipeline stages each have an embedded Critic agent. The Critic can accept the stage output (advance), request a retry within the stage, or route backward to a prior stage. Multi-stage quality gates that can cross stage boundaries.

Motivating Scenario

A software development agent system builds features from specs. Without per-stage feedback, code generation with no inline review produces 40% of features requiring rework after integration testing. The rework is discovered too late - after the full pipeline has executed, the error is expensive to fix.

With staged feedback: the Planner output is reviewed by a Plan Critic, which catches infeasible specs before coding begins. The Coder output is reviewed by a Code Critic, which catches bugs before the test suite runs. The Test Runner output is reviewed by a Test Critic, which catches coverage gaps before deployment. Result: 8% rework rate post-deployment, 2.3x more pipeline stages but 3.1x less total rework compute. Each Critic is a quality gate that shifts defect detection left - the earlier in the pipeline a bug is caught, the cheaper it is to fix.

Structure

Key Metrics

Metric	Signal
Mean iterations per stage	Baseline efficiency signal - rising mean at any stage indicates Critic over-strictness or Generator quality degradation at that stage
Cross-stage backtrack rate	% of critic decisions that route to a prior stage - target ≤15%; above 25% indicates systemic upstream quality failure, not localized errors
Stage completion rate within budget	% of features completing each stage within the retry budget - low completion rate signals critic calibration problems or generator quality regression
Per-stage critic accuracy	Does critic rejection predict downstream failure? Measure by tracking whether critic-passed outputs fail at later stages - low predictive accuracy means the critic is not catching the right defect class

Metric

Signal

Mean iterations per stage

Baseline efficiency signal - rising mean at any stage indicates Critic over-strictness or Generator quality degradation at that stage

Cross-stage backtrack rate

% of critic decisions that route to a prior stage - target ≤15%; above 25% indicates systemic upstream quality failure, not localized errors

Stage completion rate within budget

% of features completing each stage within the retry budget - low completion rate signals critic calibration problems or generator quality regression

Per-stage critic accuracy

Does critic rejection predict downstream failure? Measure by tracking whether critic-passed outputs fail at later stages - low predictive accuracy means the critic is not catching the right defect class

Node	What it does	What it receives	What it produces
Planner	Decomposes feature spec into implementation plan: tasks, interfaces, acceptance criteria. On retry, incorporates Plan Critic feedback.	Feature spec (+ Plan Critic feedback on retry)	Structured implementation plan: task list, interface contracts, test criteria
Plan Critic	Checks plan feasibility: missing interfaces, ambiguous acceptance criteria, underspecified edge cases. Routes forward on accept, back to Planner on redo, or flags spec-level issues for upstream escalation.	Implementation plan + spec + code standards	Accept verdict OR structured feedback: {issue, location, required change}
Coder	Implements the plan. On retry from Code Critic, incorporates specific issue list. On back-route from Code Critic, waits for revised plan from Planner.	Implementation plan (+ Code Critic feedback on retry)	Code artifact: implementation + inline documentation
Code Critic	Checks code for correctness, style compliance, test coverage hooks. Does not run code - static analysis only. Routes forward on accept, back to Coder on redo, or back to Planner if plan is root cause.	Code artifact + implementation plan + coding standards	Accept verdict OR structured issue list with routing decision: {redo \| back-to-plan}
Test Runner	Executes test suite against code. Generates coverage report and failure trace. On retry, re-runs with expanded test scope per Test Critic guidance.	Code artifact + test criteria from plan	Test report: pass/fail per criterion, coverage percentage, failure traces
Test Critic	Checks test coverage against acceptance criteria. Identifies untested paths and insufficient assertions. Routes forward on accept, back to Test Runner for expanded coverage, or back to Coder if tests reveal implementation bugs.	Test report + acceptance criteria + coverage targets	Accept verdict OR routing decision: {redo-test \| back-to-code} + specific gap list
Deploy Agent	Packages and deploys the feature. Emits deployment confirmation with artifact reference.	Approved code artifact + approved test report	Deployed feature: artifact ID, deployment timestamp, rollback reference

Node

What it does

What it receives

What it produces

Planner

Decomposes feature spec into implementation plan: tasks, interfaces, acceptance criteria. On retry, incorporates Plan Critic feedback.

Feature spec (+ Plan Critic feedback on retry)

Structured implementation plan: task list, interface contracts, test criteria

Plan Critic

Checks plan feasibility: missing interfaces, ambiguous acceptance criteria, underspecified edge cases. Routes forward on accept, back to Planner on redo, or flags spec-level issues for upstream escalation.

Implementation plan + spec + code standards

Accept verdict OR structured feedback: {issue, location, required change}

Coder

Implements the plan. On retry from Code Critic, incorporates specific issue list. On back-route from Code Critic, waits for revised plan from Planner.

Implementation plan (+ Code Critic feedback on retry)

Code artifact: implementation + inline documentation

Code Critic

Checks code for correctness, style compliance, test coverage hooks. Does not run code - static analysis only. Routes forward on accept, back to Coder on redo, or back to Planner if plan is root cause.

Code artifact + implementation plan + coding standards

Accept verdict OR structured issue list with routing decision: {redo | back-to-plan}

Test Runner

Executes test suite against code. Generates coverage report and failure trace. On retry, re-runs with expanded test scope per Test Critic guidance.

Code artifact + test criteria from plan

Test report: pass/fail per criterion, coverage percentage, failure traces

Test Critic

Checks test coverage against acceptance criteria. Identifies untested paths and insufficient assertions. Routes forward on accept, back to Test Runner for expanded coverage, or back to Coder if tests reveal implementation bugs.

Test report + acceptance criteria + coverage targets

Accept verdict OR routing decision: {redo-test | back-to-code} + specific gap list

Deploy Agent

Packages and deploys the feature. Emits deployment confirmation with artifact reference.

Approved code artifact + approved test report

Deployed feature: artifact ID, deployment timestamp, rollback reference

When to Use

Use when

Defects found late are significantly more expensive than defects found early
Each stage has articulable, automatable quality criteria
Stages have natural handoff points with inspectable intermediate artifacts
Cross-stage back-routing is rare (expected backtrack rate <20%)
Pipeline stages are heterogeneous - different tools, models, and data per stage

Avoid when

Critic quality criteria are informal - critics will block or pass inconsistently
Backtrack rates are high - single Evaluator-Optimizer per stage is simpler
Stage outputs are not independently inspectable - critics cannot evaluate without full context
Hard latency constraints - critic compute at every stage adds wall-clock time

Value Profile

Origin of Value	Where it appears	How it is captured
Future Cashflow	Output quality vs. pipeline cost	Value realized as rework reduction post-deployment. The economic case requires that critic compute cost at each stage is less than the expected rework cost it prevents. At 8% vs 40% post-deployment rework rate, the 3.1x compute saving is the primary value argument.
Governance	Each Critic node	Each Critic encodes stage-specific quality policy. The Plan Critic enforces planning standards; the Code Critic enforces coding standards; the Test Critic enforces coverage policy. The Critic ensemble is the organization's quality governance layer - changes to standards are changes to Critics.
Risk Exposure	Cross-stage backtrack rate	Too-high backtrack rate means the system is unreliable and expensive. Too-low backtrack rate means Critics are calibrated too loosely and defects pass through. Target: ≤15% cross-stage backtracks; ≤30% within-stage retries. Rates outside these bounds signal critic calibration problems.
Conditional Action	Critic compute at every stage	Critics are always-on cost. Unlike the pipeline workers (which are fixed per execution), every critic runs on every output - including good outputs. Critic cost is proportional to output volume, not defect rate. In high-throughput systems, critic compute can exceed worker compute.

Origin of Value

Where it appears

How it is captured

Future Cashflow

Output quality vs. pipeline cost

Value realized as rework reduction post-deployment. The economic case requires that critic compute cost at each stage is less than the expected rework cost it prevents. At 8% vs 40% post-deployment rework rate, the 3.1x compute saving is the primary value argument.

Governance

Each Critic node

Each Critic encodes stage-specific quality policy. The Plan Critic enforces planning standards; the Code Critic enforces coding standards; the Test Critic enforces coverage policy. The Critic ensemble is the organization's quality governance layer - changes to standards are changes to Critics.

Risk Exposure

Cross-stage backtrack rate

Too-high backtrack rate means the system is unreliable and expensive. Too-low backtrack rate means Critics are calibrated too loosely and defects pass through. Target: ≤15% cross-stage backtracks; ≤30% within-stage retries. Rates outside these bounds signal critic calibration problems.

Conditional Action

Critic compute at every stage

Critics are always-on cost. Unlike the pipeline workers (which are fixed per execution), every critic runs on every output - including good outputs. Critic cost is proportional to output volume, not defect rate. In high-throughput systems, critic compute can exceed worker compute.

Dynamics and Failure Modes

Variants

Variant	Modification	When to use
Critic-as-Veto	Critics can only block (force retry within stage), not route backward - simpler routing graph with no cross-stage edges	Cross-stage routing adds complexity that exceeds its value - when defects are almost always localized to the current stage and backward routing rarely produces better results
Budget-Bounded Feedback	Each stage has a max retry count; on budget exhaustion the Critic must accept with documented gaps or escalate - forced-advance prevents starvation	Production systems with throughput SLAs - forward progress guarantees are non-negotiable and human escalation is a defined, acceptable outcome
Cascading Critics	Each Critic passes its judgment to the next Critic before a final gate decision - the Test Critic sees what the Plan Critic and Code Critic flagged before making its routing decision	Downstream critics need upstream quality context to diagnose root cause - prevents the stage isolation violation failure mode at the cost of tighter inter-critic coupling

Variant

Modification

When to use

Critic-as-Veto

Critics can only block (force retry within stage), not route backward - simpler routing graph with no cross-stage edges

Cross-stage routing adds complexity that exceeds its value - when defects are almost always localized to the current stage and backward routing rarely produces better results

Budget-Bounded Feedback

Each stage has a max retry count; on budget exhaustion the Critic must accept with documented gaps or escalate - forced-advance prevents starvation

Production systems with throughput SLAs - forward progress guarantees are non-negotiable and human escalation is a defined, acceptable outcome

Cascading Critics

Each Critic passes its judgment to the next Critic before a final gate decision - the Test Critic sees what the Plan Critic and Code Critic flagged before making its routing decision

Downstream critics need upstream quality context to diagnose root cause - prevents the stage isolation violation failure mode at the cost of tighter inter-critic coupling

Related Patterns

Pattern	Relationship
10.11 Pipeline	Base structure without feedback - use when stage quality is sufficient without critics and rework cost is acceptable
10.15 Evaluator-Optimizer	Single-stage feedback loop - use when only one stage requires iterative refinement rather than quality gates at every stage
30.31 Feedback Loop	Feedback over time vs. feedback within a single execution - Critics in this pattern make per-execution routing decisions; Feedback Loop aggregates decisions across executions to improve models

Pattern

Relationship

10.11 Pipeline

Base structure without feedback - use when stage quality is sufficient without critics and rework cost is acceptable

10.15 Evaluator-Optimizer

Single-stage feedback loop - use when only one stage requires iterative refinement rather than quality gates at every stage

30.31 Feedback Loop

Feedback over time vs. feedback within a single execution - Critics in this pattern make per-execution routing decisions; Feedback Loop aggregates decisions across executions to improve models

Investment Signal

The Critic ensemble is the firm's quality standard encoded as software. A pipeline that runs with no per-stage critics is a pipeline that cannot tell you where quality is lost - it produces outputs, and humans discover quality problems downstream. A pipeline with calibrated critics has an auditable quality profile: per-stage defect rates, critic accuracy, and backtrack distributions are all measurable.

Acquirers should ask: are the critics calibrated against ground truth, or are they hand-crafted heuristics? Hand-crafted critics drift and degrade as the underlying task distribution shifts. Critics trained against historical labeled outputs compound over time - they are organizational knowledge, not code.

Red flag: a system where the total cross-stage backtrack rate is below 1% has critics that are either too lenient (passing defects through) or irrelevant (the pipeline does not produce defects at those stages). Neither is evidence of a well-functioning quality layer. Target: critics that block 10..30% of outputs within-stage and route 5..15% backward. A critic that never blocks is not a critic - it is overhead.