40.41 Tool-Use Loop (ReAct)

Agent alternates between Thought (reasoning about what to do next) and Action (calling a tool), repeating until the task is complete or budget is exhausted. Routing is decided by the model at runtime.


Motivating Scenario

A hedge fund analyst needs to answer: "What is the 3-year revenue CAGR of the top-5 cloud infrastructure companies and how does it compare to Azure's current guidance?" A single LLM call cannot answer this — it requires web search for recent earnings, a calculator for CAGR computation, a database lookup for Azure guidance, and synthesis of results. The ReAct agent performs 6-9 tool calls per query, completes in 45 seconds, and produces a cited answer.

Without the loop, the analyst spends 2 hours per query. The key structural insight: the agent does not know at start time which tools it will need or in what order. Tool selection and sequencing are emergent — decided by the Reasoner at each iteration based on what it has already learned.

Structure

Zoom and pan enabled · Concrete example: hedge fund financial research agent

Key Metrics

MetricSignal
Mean tool calls per task Indicates task complexity and loop efficiency — high counts signal either deep tasks or poor Reasoner decision-making
Tool call success rate Fraction of calls that return a valid result — failures force re-iteration and inflate cost
Context utilization % Buffer fullness at task completion — approaching 100% signals context stuffing risk
Task completion rate within budget Fraction of tasks that reach a final answer before hitting iteration or token limits
Answer quality score End-to-end accuracy evaluated against ground truth or human judgment — the primary output metric
NodeWhat it doesWhat it receivesWhat it produces
Reasoner Determines next action or declares task complete Original query + observation buffer Thought + action directive, or final answer signal
Tool Dispatcher Selects and calls the appropriate tool; enforces allowed tool policy Action directive from Reasoner Tool call result (raw)
Web Search Fetches current earnings data and analyst reports Search query string Ranked search results with snippets
Calculator Computes CAGR and other financial metrics exactly Revenue figures and time period Numeric result with formula trace
Database Lookup Retrieves Azure guidance and structured financial records Structured query Record set from internal data store
Observation Buffer Appends each tool result to the running context; feeds Reasoner on next iteration Raw tool result Updated context window for Reasoner

When to Use

Use when
Avoid when

Value Profile

Origin of ValueWhere it appearsHow it is captured
Future Cashflow Answer quality Quality increases monotonically with tool access breadth. Each additional privileged tool (proprietary DB, real-time feed) widens the quality gap over competitors using only public data.
Governance Tool Dispatcher The Dispatcher is the policy enforcement layer — it defines which tools the agent may call and under what conditions. Governance is not in the Reasoner; it is in the allowed-tool registry.
Conditional Action Each iteration Every Reasoner-Dispatcher cycle is compute spend. Budget blindness — agent iterating without cost awareness — is the primary cost failure mode.
Risk Exposure Tool calls and Observation Buffer Tool hallucination (agent fabricating output instead of calling the tool) and context stuffing (buffer exceeds context window) are the two catastrophic failure vectors.
VCM analog: Access Token. The agent's value derives from its access to tools. A Tool-Use Loop without privileged tool access is just a reasoning loop — the tools are the moat, not the reasoning.

Dynamics and Failure Modes

Infinite reasoning loop

The Reasoner never emits a Done=true signal — it continues generating actions indefinitely. This occurs when the task is underspecified, the completion criterion is ambiguous, or the model is not prompted to self-terminate. Fix: define an explicit completion condition in the system prompt, and enforce a hard iteration cap at the Dispatcher level independent of model output.

Tool hallucination

The Reasoner fabricates a plausible tool result in its Thought step rather than issuing a real tool call. The Observation Buffer receives a hallucinated observation, and subsequent reasoning compounds the error. This is undetectable from the model's output alone. Fix: require all observations to come from a verified Dispatcher response; never allow the model to self-supply observations.

Context stuffing

After many iterations, the Observation Buffer fills the context window. The Reasoner loses access to the original query or early observations. Quality degrades silently — the model does not signal that its context is truncated. Fix: implement a summarization step that compresses older observations before appending new ones, or cap observation verbosity at the Dispatcher.

Budget blindness

The agent does not track token spend or iteration count against its allocated budget. It exhausts compute before completing the task, or produces a best-effort answer without flagging incompleteness. Fix: inject a budget state variable into the Reasoner's context at each iteration; prompt it to produce a partial answer when approaching limits.

Variants

VariantModificationWhen to use
Constrained ReAct Hard cap on iterations and total tokens enforced at Dispatcher; agent prompted to produce best-effort answer on budget expiry Production environments where cost and latency SLAs must be guaranteed
Parallel Tool-Use Reasoner dispatches multiple tool calls simultaneously; Dispatcher fans out, collects results, Reasoner synthesizes Independent tool calls with no ordering dependency — reduces wall-clock latency
Cached Tool-Use Dispatcher memoizes tool results keyed by (tool, normalized input) within the session Agent likely to re-query the same data (e.g., same company across multiple questions in a session)

Related Patterns

PatternRelationship
10.11 PipelineUse when the tool sequence is fixed and known at design time — eliminates runtime routing overhead
20.23 Orchestrator-WorkersMultiple Tool-Use Loop agents coordinated by a higher-level Orchestrator — appropriate when subtasks require separate agents
10.15 Evaluator-OptimizerAdd a quality gate on the final answer before delivery — the Evaluator can trigger a fresh loop if quality is insufficient

Investment Signal

The Tool-Use Loop is the pattern where the moat is most legible: enumerate the tools, enumerate the policies. An agent with access to a proprietary real-time financial database, an internal calculation engine, and a compliance-checked tool registry is structurally differentiated from one calling only public APIs. The Dispatcher's allowed-tool registry is the IP surface.

Audit signal: request a Dispatcher trace log. It shows every tool called, every result received, and every iteration. A firm that cannot produce this log has no observability into its agent's decision process — and cannot price errors, scope liability, or improve the system.

Red flag: observation buffers without summarization. When context stuffing is unmanaged, agent quality degrades non-linearly as task complexity increases. The system appears to work in demos (short tasks) and fails silently in production (long research tasks).