Multiple concurrent instances run in parallel. Once N instances complete, the remaining M-N active instances are cancelled and the next task is triggered immediately. No waiting for stragglers; no archival of remaining results. Saves compute at the cost of discarding in-flight work.
A distributed search system shards a query across 20 parallel retrieval agents — each agent searches a different data shard. Once 12 agents return results, sufficient coverage is achieved and ranking begins immediately. The remaining 8 agents are cancelled: their HTTP connections are closed, their allocated memory is released, and their partial results are discarded. The ranking engine operates on the 12 returned result sets.
The key insight: retrieval coverage follows diminishing returns. The first 12 shards typically return the most relevant results (by probability of match). Waiting for all 20 delays ranking without meaningfully improving result quality for most queries. More importantly, cancelling the remaining 8 agents frees compute that can be immediately reallocated to the next query — enabling higher throughput at the same infrastructure cost. This is the resource-reclaiming variant of the partial join: pay for N results, recover cost of M-N immediately.
| Metric | Signal |
|---|---|
| P50 ranking latency | Time from query launch to ranking start — determined by the N-th fastest shard completion |
| Cancellation efficiency | Fraction of allocated compute recovered via cancellation — (M-N)/M is the theoretical maximum; actual depends on how far cancelled instances had progressed |
| Recall@N vs. recall@M | Quality difference between operating on N results vs all M — the core calibration metric for the N threshold |
| Cancellation confirmation rate | Fraction of cancel signals successfully confirmed — below 95% indicates a zombie agent problem |
| Node | What it does | What it receives | What it produces |
|---|---|---|---|
| Launch Retrieval | Shards the query across 20 data partitions and spawns one retrieval agent per shard | Query + shard routing table | 20 parallel retrieval agent activations |
| Shard Retrieval (×20) | Each agent scans its assigned shard, scores candidates, and returns a ranked result set. May be cancelled before completion. | Query + shard index | Ranked candidate list (if not cancelled) |
| Cancel Remaining | Fires when 12 results arrive. Sends cancellation signals to all remaining active retrieval instances. Confirms cancellation before passing results forward. | 10.12 result sets + active instance registry | 10.12 result sets + cancellation confirmation |
| Rank Results | Merges and re-ranks the 12 result sets using a cross-shard scoring model | 10.12 ranked candidate lists | Final ranked output for the query |
| Origin of Value | Where it appears | How it is captured |
|---|---|---|
| Future Cashflow | Query throughput | Cancelling 8 agents per query frees 40% of allocated compute immediately. At 1000 queries/minute, this translates directly to infrastructure cost reduction or higher QPS on the same hardware. |
| Conditional Action | Cancelled instances | Each cancelled instance spent some compute before cancellation (fetch, partial scan). This sunk cost is unavoidable. The gain is the compute not spent after cancellation. Net efficiency depends on how early in their execution the 8 instances are cancelled. |
| Risk Exposure | Result quality at threshold N | If the 8 cancelled shards contain the highest-relevance documents for this specific query, quality degrades. For uniform shard quality this risk is symmetric and low; for non-uniform shards, shard selection strategy matters. |
| Governance | Cancellation mechanism reliability | The system must be able to reliably cancel in-flight agents and confirm the cancellation. A cancelled agent that continues consuming resources defeats the purpose. Monitor cancellation confirmation rate. |
N calibration is a quality-cost tradeoff. Setting N=12 out of 20 is an engineering claim: "12 shards provide sufficient recall for this workload." This should be validated empirically by running A/B tests between N=12 and N=20, measuring recall difference against cost difference. The optimal N is workload-specific and should be revisited as query distributions shift.
The cancel signals are sent but 3 of the 8 remaining agents ignore them (network partition, implementation bug). These agents continue running and consuming resources but their results are no longer being tracked. Fix: maintain an active instance registry with heartbeat monitoring. Any instance that does not confirm cancellation within a timeout is force-killed and its resource allocation is reclaimed at the infrastructure level, not the application level.
Some retrieval agents return empty result sets (their shard had no matching documents). The partial join fires at 12 completions including 4 empty results. The ranker receives only 8 non-empty result sets. Fix: distinguish between "complete with results" and "complete empty" tokens. Only non-empty completions count toward the N threshold, or adjust N to account for expected empty-shard rate.
A cancelled agent is in mid-response when the cancel signal arrives. It sends its result anyway before shutting down. The ranker receives 13 results instead of 12. If the ranker assumes exactly 12, this causes an off-by-one error. Fix: the cancel confirmation step must include an idempotency filter — any result arriving after the cancellation signal is accepted is silently dropped, regardless of whether it arrived before or after the physical connection close.
20 shards are not uniform in document density. The 8 densest shards (containing 70% of relevant documents) happen to be the 8 slowest. The first 12 to complete are the sparse shards. N=12 was calibrated on uniform shards and produces poor recall in practice. Fix: shard assignment should account for expected density and latency. High-density shards should be prioritized in the dispatch ordering, not treated as equivalent to sparse shards.
| Variant | Modification | When to use |
|---|---|---|
| Quality-Adaptive Cancel | Cancel fires when aggregate quality score of completed results exceeds a threshold, not at a fixed count | Query difficulty varies; some queries need 8 results and some need 18 — a fixed N wastes resources on easy queries |
| Soft Cancel with Grace Period | Cancel signal is a "please stop soon" request; instances have a short grace period to complete naturally | Abrupt cancellation is expensive (connection teardown overhead); instances typically complete within 100ms of receiving the signal |
| Tiered Cancellation | At N1, results are passed forward; at N2, remaining instances are cancelled; between N1 and N2, instances can still complete and their results are appended | Latency reduction is more important than resource reclamation; the brief window between N1 and N2 captures additional results cheaply |
| Pattern | Relationship |
|---|---|
| 60.65 Static Partial Join MI | The non-cancelling variant — remaining instances complete normally; use when archival value justifies the compute cost |
| 60.67 Dynamic Partial Join MI | When M or N are determined at runtime or new instances can be added dynamically |
| 40.45 Cancelling Discriminator | The N=1 special case — fires on first completion and cancels all others; 30.39 generalizes this to N-of-M |
| 60.61 MI Without Synchronization | No join, no cancellation — instances write independently; use when even the N-threshold coordination overhead is unneeded |