80.81 Feedback Loop

System outputs produce real-world outcomes. Outcomes are measured. The measurement signal updates the system — model weights, prompts, retrieval stores, or routing thresholds. The firm learns continuously from its own operation.


Motivating Scenario

A legal AI startup deploys a contract review pipeline (see 11). After 6 months, they have 40,000 reviewed contracts, each with an outcome: did the client accept the AI's recommendation? Did they override it? If they overrode it — what did they change, and was the change materially better?

Without a Feedback Loop, this data sits in logs. The model that reviewed contract #40,000 is identical to the model that reviewed contract #1. The firm has been operating for 6 months and has learned nothing.

With a Feedback Loop: every override is captured as a labeled training example. The signal processing stage filters out overrides driven by client preference (not model error) from overrides where the model was objectively wrong. The model update stage fine-tunes the Critic on high-confidence corrections monthly. Six months in, the Critic's accuracy has improved from 87% to 94% — not because the model was upgraded, but because the firm's own operation trained it.

Structure

Zoom and pan enabled · Concrete example: contract review pipeline with outcome-driven learning

Key Metrics

MetricSignal
Override rate trend Primary signal — declining rate indicates the loop is working; rising rate on recent contracts signals distribution shift
Signal quality ratio Error-driven overrides / total overrides — should be >60% for the training signal to be meaningful
Performance improvement per cycle Critic accuracy delta after each monthly update — validates the loop is generating value
A/B test win rate Updated model outperforms prior version — required before full deployment of each update
NodeWhat it doesInputOutput
Contract Review Pipeline The operational system — the 11 Pipeline running in production. Produces contract recommendations that clients act on. Contract + deal parameters Recommendation delivered to client
Outcome Tracker Captures what the client actually did: accepted AI recommendation, overrode it, requested changes. Records the delta between AI output and final executed contract. Client actions on AI recommendations Override log: {contract_id, AI_recommendation, client_action, delta, timestamp}
Signal Processor Filters override log: removes preference-driven overrides (client style choice) from error-driven overrides (model was wrong). Labels high-confidence corrections. Deduplicates similar patterns. Override log + human labeling (sample) Training signal: {input, correct_output, confidence} tuples
Model Updater Determines update type: fine-tune Critic on new training signal, update Critic's criteria list, adjust confidence thresholds. Runs monthly batch update. A/B tests updated model before full deployment. Training signal + current model + A/B test results Updated Critic deployed to pipeline

When to Use

Use when
Avoid when

Value Profile

Origin of ValueWhere it appearsHow it is captured
Future Cashflow Compounding over operation time Performance improves with each feedback cycle. A firm that has operated for 2 years with a clean feedback loop produces better output than one running the same base model for 2 years without one. The gap widens monotonically.
Governance Model Updater node The feedback signal shapes future behavior of all agents in the system. It is the highest-authority update mechanism — more powerful than any prompt change because it alters model weights, not just runtime context.
Representation Training signal accumulation The accumulated override log represents the firm's operational history encoded as model knowledge. This asset grows with operation and cannot be replicated by a competitor starting fresh — it is the institutional memory of the system.
Risk Exposure Signal Processor node A miscalibrated signal processor that mislabels preference-driven overrides as model errors will actively degrade the system. Distribution shift: the loop optimizes for past client behavior, not future. Both risks compound silently.
VCM analog: Staking with compounding governance (veToken model). Like CRV/CVX where continuous participation generates compounding governance rights: the firm's operational history is the "stake," each feedback cycle increases model influence over future behavior, and value compounds monotonically with time in operation.

Dynamics and Failure Modes

Distribution shift — optimizing for the past

The Signal Processor trains the Critic on overrides from 18 months ago. Client expectations have shifted: they now require stronger data privacy clauses that were rare in the training data. The Critic still evaluates contracts against the old standard and passes clauses that clients now routinely override. The feedback loop is running, but learning the wrong thing. Fix: monitor the override rate on recently-produced contracts separately from the historical rate. A rising override rate on recent contracts is the leading indicator of distribution shift, even if overall override rate is flat.

Reward hacking the training signal

The Signal Processor incorrectly labels a pattern of client overrides as model errors when they are actually client style preferences (e.g., clients in one jurisdiction prefer passive voice in indemnification clauses). The Critic is fine-tuned to produce passive voice in those clauses. This "fixes" the override metric without improving actual contract quality. Fix: the Signal Processor must distinguish error-driven from preference-driven overrides. A human labeling sample of 200 overrides per month is sufficient to calibrate the Signal Processor's classification.

Feedback latency mismatch

Contract outcomes (was the client satisfied? did the clause hold up in a dispute?) take 6-18 months to become observable. The monthly update cycle uses override data as a proxy — but overrides are noisy proxies for actual contract quality. A Critic that minimizes overrides may still produce contracts that underperform in disputes. Fix: track leading indicators (override rate) and lagging outcomes (dispute performance, client renewal) separately. Weight training signal by outcome quality, not only override frequency.

Variants

VariantModificationWhen to use
Online Learning Loop System updates after every interaction, not in monthly batches High-volume environment where monthly cycles are too slow; update cost per interaction is low (prompt update, not fine-tuning)
Human-Labeled Loop Outcome Tracker routes a sample to human labelers before Signal Processor Ground truth requires expert judgment; accept higher latency for signal quality — legal, medical, compliance domains
Multi-Signal Loop Multiple independent outcome signals combined: override rate + client satisfaction + dispute outcomes No single metric captures the full outcome; diversify to reduce reward hacking risk and improve signal quality

Related Patterns

PatternRelationship
10.11 PipelineThe operational system that the Feedback Loop wraps — the Pipeline produces outputs; the Loop learns from outcomes
10.15 Evaluator-OptimizerSingle-session analog — Evaluator-Optimizer is a micro-feedback loop scoped to one task; Feedback Loop operates across thousands of tasks over months

Investment Signal

Firms with clean feedback loops compound over time. Each operation makes the system better — value increases with time in production, not with headcount or capital raised. This is the clearest analog to network effects in AI-native organizations: the more you use the system, the better it gets, and the better it gets, the more you use it.

The valuation premium is justified if, and only if: (1) the feedback signal is clean — measuring actual outcomes, not proxies; (2) the learning rate is measurable — improvement per feedback cycle is quantifiable; (3) the loop is stable — no distribution shift accelerating.

The single most important due diligence question for an AI-native firm: show me a performance-over-time chart for your core AI system. If performance does not measurably improve with operation time, the feedback loop either does not exist or does not work. All other moat claims are secondary to this one.