30.40 Dynamic Partial Join for Multiple Instances

Multiple instances run concurrently and new instances can be added dynamically during execution. A runtime completion condition — evaluated continuously — determines when to proceed. Neither M nor N are fixed at design time; both emerge from the execution itself.


Motivating Scenario

A web crawl AI starts with a seed set of 10 URLs and discovers new pages to scrape as it goes. Each scraper agent processes one URL and may surface additional URLs that are added to the work queue. New scraper instances are spawned dynamically as new URLs are discovered. The aggregation phase begins when two conditions are simultaneously satisfied: the content quality score across all completed scrapers exceeds 0.85, AND the URL discovery queue is empty (no new work is being generated). Neither of these conditions can be evaluated before execution starts — both depend on what the crawl actually finds.

The key insight: this pattern is fundamentally different from static and static-partial MI variants. The instance set is open: new members can join during execution. The completion criterion is not "N of M instances done" — it is a predicate over execution state. This makes 30.40 the most expressive MI join pattern and also the hardest to implement correctly. It maps naturally to any AI workflow with recursive or exploratory structure: the process generates its own workload as it runs, and "done" is a semantic condition, not a count.

Structure

Zoom and pan enabled · Concrete example: dynamic web crawl with quality-gated aggregation (top-bottom layout)

Key Metrics

MetricSignal
Total instance count per run How many scraper instances were spawned — primary cost driver; track distribution across runs to detect pathological cases
Queue depth over time Tracks whether the crawl is converging (queue shrinking) or diverging (queue growing) — early warning for non-termination
Aggregate quality score trajectory How quality evolves as more scrapers complete — validates whether more instances produce meaningful quality gains
Time to completion condition Wall-clock time from seed to aggregation trigger — the primary latency signal for the end-to-end workflow
NodeWhat it doesWhat it receivesWhat it produces
URL Discoverer Reads from the URL queue. If a URL is available, routes it to a Scraper Agent instance. If the queue is empty and the completion condition is met, routes to aggregation. Loops back to itself to re-check the queue. URL queue state + quality gate signal URL dispatched to new scraper instance, OR aggregation trigger
Scraper Agent Fetches and parses the assigned URL. Extracts structured content and discovers outbound links. Adds new URLs to the shared queue. Emits a content quality score on completion. Single URL + scraping config Structured page content + new URL additions to queue + quality score
Quality Gate After each scraper completion, evaluates: (1) is aggregate quality score > 0.85? (2) is the URL queue empty? Routes to Discoverer for more work if either condition fails; routes to aggregation if both pass. Scraper output + aggregate quality state + queue depth Continue signal (to Discoverer) OR completion signal (to Aggregator)
Aggregate Results Collects all completed scraper outputs and produces a unified knowledge graph from the crawl All completed scraper content artifacts Crawl knowledge graph

When to Use

Use when
Avoid when

Value Profile

Origin of ValueWhere it appearsHow it is captured
Future Cashflow Crawl coverage quality The dynamic instance set means coverage adapts to what the crawl finds. A topic-rich seed produces more instances and higher coverage; a sparse seed terminates quickly. Quality is outcome-adaptive rather than input-sized.
Governance Quality Gate completion condition The two-part predicate (quality score AND queue empty) is the governance mechanism. Each clause is an independently tunable policy parameter. Weakening either clause terminates the crawl earlier; strengthening either extends it.
Conditional Action Each scraper instance Compute cost is entirely determined by the runtime execution path — unknown at design time. Budget caps (max instances, max runtime) are essential guardrails for cost control.
Risk Exposure Non-termination If new URLs are discovered faster than scrapers complete, the queue never empties and the completion condition never fires. The crawl runs indefinitely. Mandatory circuit breakers (max total instances, max wall-clock time) are non-negotiable.
Semantic completion vs. count-based completion. 30.38 and 30.39 complete when a number is reached. 30.40 completes when a condition is true. This is a fundamentally different termination model. The quality of a 30.40 implementation depends entirely on the precision of the completion predicate. Vague predicates ("sufficient coverage") produce non-deterministic termination. Precise predicates ("quality score > 0.85 AND queue depth = 0 for 30 consecutive seconds") produce reproducible behavior.

Dynamics and Failure Modes

Non-termination (crawl depth explosion)

The crawl discovers high-density link graphs. Each scraper finds 20 new URLs. The queue grows faster than it drains. The "queue empty" condition never fires. The crawl runs indefinitely, consuming unbounded compute. Fix: implement a hard cap on total URL additions (e.g., max 500 URLs queued regardless of discovery). When the cap is hit, the queue is sealed — no new URLs are accepted — and the completion condition is re-evaluated with a relaxed criterion (quality score only, queue cap is not "empty" but "sealed").

Race condition on completion evaluation

The Quality Gate evaluates the completion condition: queue is empty AND quality > 0.85. Both conditions are true. The gate signals aggregation. Simultaneously, a scraper adds 3 new URLs to the queue (race condition — the URL addition message was in flight when the evaluation ran). Aggregation starts on incomplete data. Fix: the "queue empty" condition must be evaluated with a distributed lock that also blocks new URL additions. "Queue empty" means "queue is empty and locked against further additions."

Quality score manipulation by outlier pages

One scraped page is exceptionally high-quality (score = 0.99). This single page pulls the aggregate score above 0.85 threshold even though most scraped content is mediocre. The completion condition fires prematurely. Fix: use a robust aggregate (median or trimmed mean) rather than arithmetic mean. Alternatively, require that both quality threshold AND minimum instance count are met.

Instance state loss on partial failure

The Quality Gate process crashes mid-execution. On recovery, it does not know which scraper instances are active, what the current aggregate quality is, or how many URLs are queued. The completion condition cannot be evaluated. Fix: gate state (active instance registry, aggregate quality accumulator, queue depth) must be durably persisted after every scraper completion. Recovery replays from the last checkpoint, not from the beginning.

Variants

VariantModificationWhen to use
Budget-Capped Dynamic Join Hard cap on total instances (max M); when cap is reached, no new instances spawn and completion condition collapses to quality-only Compute budget is fixed; the dynamic crawl must terminate within resource limits regardless of discovery rate
Time-Bounded Dynamic Join A timeout triggers a forced completion after a maximum wall-clock duration; whatever is complete at that point is passed to aggregation Hard latency SLA exists; graceful degradation on partial results is preferable to SLA violation
Incremental Aggregation Aggregation runs continuously as scrapers complete; the quality gate monitors aggregate output quality rather than individual scraper scores Aggregation is cheap and incremental; running it continuously surfaces the quality signal needed for the completion condition more accurately than per-instance scores
Convergence-Detecting Completion Completion fires when the marginal quality gain from the last K instances falls below a threshold (diminishing returns detection) Quality grows sublinearly with instance count; the optimal stopping point is the knee of the quality-cost curve

Related Patterns

PatternRelationship
60.65 Static Partial Join MIFixed M and N variant — use when both are known at design time; far simpler to implement and audit
60.66 Cancelling Partial Join MIFixed M, fixed N, with cancellation — use when cost recovery after N completions is more important than dynamic expansion
40.48 Generalised AND-JoinWaits for all activated branches — the full-N complement; 30.40 adds dynamic instance creation and semantic completion
10.15 Evaluator-OptimizerThe Quality Gate's role in 30.40 mirrors the Evaluator role — both assess output quality and decide whether to continue or stop
30.31 Feedback LoopThe Discoverer-Scraper-QualityGate cycle is a feedback loop where scrapers feed new work back into the system; 30.40 adds the termination condition