40.41 Tool-Use Loop (ReAct)

Agent alternates between Thought (reasoning about what to do next) and Action (calling a tool), repeating until the task is complete or budget is exhausted. Routing is decided by the model at runtime.

Motivating Scenario

A hedge fund analyst needs to answer: "What is the 3-year revenue CAGR of the top-5 cloud infrastructure companies and how does it compare to Azure's current guidance?" A single LLM call cannot answer this — it requires web search for recent earnings, a calculator for CAGR computation, a database lookup for Azure guidance, and synthesis of results. The ReAct agent performs 6-9 tool calls per query, completes in 45 seconds, and produces a cited answer.

Without the loop, the analyst spends 2 hours per query. The key structural insight: the agent does not know at start time which tools it will need or in what order. Tool selection and sequencing are emergent — decided by the Reasoner at each iteration based on what it has already learned.

Structure

Zoom and pan enabled · Concrete example: hedge fund financial research agent

Key Metrics

Metric	Signal
Mean tool calls per task	Indicates task complexity and loop efficiency — high counts signal either deep tasks or poor Reasoner decision-making
Tool call success rate	Fraction of calls that return a valid result — failures force re-iteration and inflate cost
Context utilization %	Buffer fullness at task completion — approaching 100% signals context stuffing risk
Task completion rate within budget	Fraction of tasks that reach a final answer before hitting iteration or token limits
Answer quality score	End-to-end accuracy evaluated against ground truth or human judgment — the primary output metric

Node	What it does	What it receives	What it produces
Reasoner	Determines next action or declares task complete	Original query + observation buffer	Thought + action directive, or final answer signal
Tool Dispatcher	Selects and calls the appropriate tool; enforces allowed tool policy	Action directive from Reasoner	Tool call result (raw)
Web Search	Fetches current earnings data and analyst reports	Search query string	Ranked search results with snippets
Calculator	Computes CAGR and other financial metrics exactly	Revenue figures and time period	Numeric result with formula trace
Database Lookup	Retrieves Azure guidance and structured financial records	Structured query	Record set from internal data store
Observation Buffer	Appends each tool result to the running context; feeds Reasoner on next iteration	Raw tool result	Updated context window for Reasoner

When to Use

Use when

Required tool sequence is unknown at design time
Task requires heterogeneous tools (search, compute, retrieval)
Agent must adapt based on intermediate results
Answer quality depends on real-time or proprietary data
Task budget (tokens, iterations) can be defined in advance

Avoid when

Tool sequence is fixed and known — use Pipeline instead
Task has hard latency requirements incompatible with iteration
Tool access cannot be governed or audited
Context window is insufficient for multi-step observation accumulation

Value Profile

Origin of Value	Where it appears	How it is captured
Future Cashflow	Answer quality	Quality increases monotonically with tool access breadth. Each additional privileged tool (proprietary DB, real-time feed) widens the quality gap over competitors using only public data.
Governance	Tool Dispatcher	The Dispatcher is the policy enforcement layer — it defines which tools the agent may call and under what conditions. Governance is not in the Reasoner; it is in the allowed-tool registry.
Conditional Action	Each iteration	Every Reasoner-Dispatcher cycle is compute spend. Budget blindness — agent iterating without cost awareness — is the primary cost failure mode.
Risk Exposure	Tool calls and Observation Buffer	Tool hallucination (agent fabricating output instead of calling the tool) and context stuffing (buffer exceeds context window) are the two catastrophic failure vectors.

VCM analog: Access Token. The agent's value derives from its access to tools. A Tool-Use Loop without privileged tool access is just a reasoning loop — the tools are the moat, not the reasoning.

Dynamics and Failure Modes

Infinite reasoning loop

The Reasoner never emits a Done=true signal — it continues generating actions indefinitely. This occurs when the task is underspecified, the completion criterion is ambiguous, or the model is not prompted to self-terminate. Fix: define an explicit completion condition in the system prompt, and enforce a hard iteration cap at the Dispatcher level independent of model output.

Tool hallucination

The Reasoner fabricates a plausible tool result in its Thought step rather than issuing a real tool call. The Observation Buffer receives a hallucinated observation, and subsequent reasoning compounds the error. This is undetectable from the model's output alone. Fix: require all observations to come from a verified Dispatcher response; never allow the model to self-supply observations.

Context stuffing

After many iterations, the Observation Buffer fills the context window. The Reasoner loses access to the original query or early observations. Quality degrades silently — the model does not signal that its context is truncated. Fix: implement a summarization step that compresses older observations before appending new ones, or cap observation verbosity at the Dispatcher.

Budget blindness

The agent does not track token spend or iteration count against its allocated budget. It exhausts compute before completing the task, or produces a best-effort answer without flagging incompleteness. Fix: inject a budget state variable into the Reasoner's context at each iteration; prompt it to produce a partial answer when approaching limits.

Variants

Variant	Modification	When to use
Constrained ReAct	Hard cap on iterations and total tokens enforced at Dispatcher; agent prompted to produce best-effort answer on budget expiry	Production environments where cost and latency SLAs must be guaranteed
Parallel Tool-Use	Reasoner dispatches multiple tool calls simultaneously; Dispatcher fans out, collects results, Reasoner synthesizes	Independent tool calls with no ordering dependency — reduces wall-clock latency
Cached Tool-Use	Dispatcher memoizes tool results keyed by (tool, normalized input) within the session	Agent likely to re-query the same data (e.g., same company across multiple questions in a session)

Related Patterns

Pattern	Relationship
10.11 Pipeline	Use when the tool sequence is fixed and known at design time — eliminates runtime routing overhead
20.23 Orchestrator-Workers	Multiple Tool-Use Loop agents coordinated by a higher-level Orchestrator — appropriate when subtasks require separate agents
10.15 Evaluator-Optimizer	Add a quality gate on the final answer before delivery — the Evaluator can trigger a fresh loop if quality is insufficient

Investment Signal

The Tool-Use Loop is the pattern where the moat is most legible: enumerate the tools, enumerate the policies. An agent with access to a proprietary real-time financial database, an internal calculation engine, and a compliance-checked tool registry is structurally differentiated from one calling only public APIs. The Dispatcher's allowed-tool registry is the IP surface.

Audit signal: request a Dispatcher trace log. It shows every tool called, every result received, and every iteration. A firm that cannot produce this log has no observability into its agent's decision process — and cannot price errors, scope liability, or improve the system.

Red flag: observation buffers without summarization. When context stuffing is unmanaged, agent quality degrades non-linearly as task complexity increases. The system appears to work in demos (short tasks) and fails silently in production (long research tasks).