80.81 Feedback Loop

System outputs produce real-world outcomes. Outcomes are measured. The measurement signal updates the system — model weights, prompts, retrieval stores, or routing thresholds. The firm learns continuously from its own operation.

Motivating Scenario

A legal AI startup deploys a contract review pipeline (see 11). After 6 months, they have 40,000 reviewed contracts, each with an outcome: did the client accept the AI's recommendation? Did they override it? If they overrode it — what did they change, and was the change materially better?

Without a Feedback Loop, this data sits in logs. The model that reviewed contract #40,000 is identical to the model that reviewed contract #1. The firm has been operating for 6 months and has learned nothing.

With a Feedback Loop: every override is captured as a labeled training example. The signal processing stage filters out overrides driven by client preference (not model error) from overrides where the model was objectively wrong. The model update stage fine-tunes the Critic on high-confidence corrections monthly. Six months in, the Critic's accuracy has improved from 87% to 94% — not because the model was upgraded, but because the firm's own operation trained it.

Structure

Zoom and pan enabled · Concrete example: contract review pipeline with outcome-driven learning

Key Metrics

Metric	Signal
Override rate trend	Primary signal — declining rate indicates the loop is working; rising rate on recent contracts signals distribution shift
Signal quality ratio	Error-driven overrides / total overrides — should be >60% for the training signal to be meaningful
Performance improvement per cycle	Critic accuracy delta after each monthly update — validates the loop is generating value
A/B test win rate	Updated model outperforms prior version — required before full deployment of each update

Node	What it does	Input	Output
Contract Review Pipeline	The operational system — the 11 Pipeline running in production. Produces contract recommendations that clients act on.	Contract + deal parameters	Recommendation delivered to client
Outcome Tracker	Captures what the client actually did: accepted AI recommendation, overrode it, requested changes. Records the delta between AI output and final executed contract.	Client actions on AI recommendations	Override log: {contract_id, AI_recommendation, client_action, delta, timestamp}
Signal Processor	Filters override log: removes preference-driven overrides (client style choice) from error-driven overrides (model was wrong). Labels high-confidence corrections. Deduplicates similar patterns.	Override log + human labeling (sample)	Training signal: {input, correct_output, confidence} tuples
Model Updater	Determines update type: fine-tune Critic on new training signal, update Critic's criteria list, adjust confidence thresholds. Runs monthly batch update. A/B tests updated model before full deployment.	Training signal + current model + A/B test results	Updated Critic deployed to pipeline

When to Use

Use when

High-volume, repeatable decisions — signal accumulates quickly
Ground truth is observable — outcomes are measurable, not latent
Feedback latency is short enough to be actionable
The system will operate long enough for compounding to matter

Avoid when

Ground truth is unavailable or too delayed — loop has no signal
Volume too low — signal is noisy at small sample sizes
Environment is rapidly non-stationary — loop optimizes for past
Update cost exceeds improvement value at current volume

Value Profile

Origin of Value	Where it appears	How it is captured
Future Cashflow	Compounding over operation time	Performance improves with each feedback cycle. A firm that has operated for 2 years with a clean feedback loop produces better output than one running the same base model for 2 years without one. The gap widens monotonically.
Governance	Model Updater node	The feedback signal shapes future behavior of all agents in the system. It is the highest-authority update mechanism — more powerful than any prompt change because it alters model weights, not just runtime context.
Representation	Training signal accumulation	The accumulated override log represents the firm's operational history encoded as model knowledge. This asset grows with operation and cannot be replicated by a competitor starting fresh — it is the institutional memory of the system.
Risk Exposure	Signal Processor node	A miscalibrated signal processor that mislabels preference-driven overrides as model errors will actively degrade the system. Distribution shift: the loop optimizes for past client behavior, not future. Both risks compound silently.

VCM analog: Staking with compounding governance (veToken model). Like CRV/CVX where continuous participation generates compounding governance rights: the firm's operational history is the "stake," each feedback cycle increases model influence over future behavior, and value compounds monotonically with time in operation.

Dynamics and Failure Modes

Distribution shift — optimizing for the past

The Signal Processor trains the Critic on overrides from 18 months ago. Client expectations have shifted: they now require stronger data privacy clauses that were rare in the training data. The Critic still evaluates contracts against the old standard and passes clauses that clients now routinely override. The feedback loop is running, but learning the wrong thing. Fix: monitor the override rate on recently-produced contracts separately from the historical rate. A rising override rate on recent contracts is the leading indicator of distribution shift, even if overall override rate is flat.

Reward hacking the training signal

The Signal Processor incorrectly labels a pattern of client overrides as model errors when they are actually client style preferences (e.g., clients in one jurisdiction prefer passive voice in indemnification clauses). The Critic is fine-tuned to produce passive voice in those clauses. This "fixes" the override metric without improving actual contract quality. Fix: the Signal Processor must distinguish error-driven from preference-driven overrides. A human labeling sample of 200 overrides per month is sufficient to calibrate the Signal Processor's classification.

Feedback latency mismatch

Contract outcomes (was the client satisfied? did the clause hold up in a dispute?) take 6-18 months to become observable. The monthly update cycle uses override data as a proxy — but overrides are noisy proxies for actual contract quality. A Critic that minimizes overrides may still produce contracts that underperform in disputes. Fix: track leading indicators (override rate) and lagging outcomes (dispute performance, client renewal) separately. Weight training signal by outcome quality, not only override frequency.

Variants

Variant	Modification	When to use
Online Learning Loop	System updates after every interaction, not in monthly batches	High-volume environment where monthly cycles are too slow; update cost per interaction is low (prompt update, not fine-tuning)
Human-Labeled Loop	Outcome Tracker routes a sample to human labelers before Signal Processor	Ground truth requires expert judgment; accept higher latency for signal quality — legal, medical, compliance domains
Multi-Signal Loop	Multiple independent outcome signals combined: override rate + client satisfaction + dispute outcomes	No single metric captures the full outcome; diversify to reduce reward hacking risk and improve signal quality

Related Patterns

Pattern	Relationship
10.11 Pipeline	The operational system that the Feedback Loop wraps — the Pipeline produces outputs; the Loop learns from outcomes
10.15 Evaluator-Optimizer	Single-session analog — Evaluator-Optimizer is a micro-feedback loop scoped to one task; Feedback Loop operates across thousands of tasks over months

Investment Signal

Firms with clean feedback loops compound over time. Each operation makes the system better — value increases with time in production, not with headcount or capital raised. This is the clearest analog to network effects in AI-native organizations: the more you use the system, the better it gets, and the better it gets, the more you use it.

The valuation premium is justified if, and only if: (1) the feedback signal is clean — measuring actual outcomes, not proxies; (2) the learning rate is measurable — improvement per feedback cycle is quantifiable; (3) the loop is stable — no distribution shift accelerating.

The single most important due diligence question for an AI-native firm: show me a performance-over-time chart for your core AI system. If performance does not measurably improve with operation time, the feedback loop either does not exist or does not work. All other moat claims are secondary to this one.