20.22 Multi-Merge

A convergence point where each completing incoming branch independently triggers the outgoing flow — no synchronization. If N branches are active, the downstream task fires N times. Each branch result is processed as it arrives, not after all branches complete.


Motivating Scenario

A legal document processing system fans out to three OCR engines simultaneously: a fast low-cost engine, a deep high-accuracy engine, and a handwriting specialist. Each engine has a different latency and cost profile. Rather than waiting for all three before indexing, each result triggers an independent indexing job the moment it arrives.

The index accumulates results incrementally. When the fast engine returns in 2 seconds, the index is updated immediately. When the deep engine returns in 8 seconds, the index is enriched further. When the handwriting engine returns in 12 seconds, the final enrichment is applied. Each completion is valuable on its own — the index is useful before all three are done. This is Multi-Merge: the downstream task is idempotent and accumulative, designed to fire multiple times.

Structure

Zoom and pan enabled · Concrete example: legal document OCR with incremental indexing

Key Metrics

MetricSignal
Time to first result Latency of the fastest branch — determines how quickly the index becomes useful
Time to full enrichment Latency of the slowest branch — when the index reaches maximum fidelity
Per-engine contribution delta How much each engine adds over the previous — quantifies marginal value of each branch
Downstream invocation count per document Should equal active branch count. Count != N indicates lost or duplicate firings.
NodeWhat it doesWhat it receivesWhat it produces
Dispatch OCR Sends the document to all three OCR engines simultaneously via AND-split Raw document (PDF or image) Three simultaneous OCR jobs dispatched
Fast OCR Low-latency OCR pass — high throughput, moderate accuracy, returns in ~2 seconds Document Structured text extraction (confidence: ~85%)
Deep OCR High-accuracy OCR pass — runs full layout analysis and semantic correction Document Structured text extraction (confidence: ~97%)
Handwriting OCR Specialist model for handwritten annotations and marginalia Document Handwritten section extraction + annotation metadata
Index Result Incrementally enriches the document index with each arriving OCR result. Fires independently for each completing branch — designed to handle multiple invocations per document. Single OCR result (whichever arrived) Updated document index entry (idempotent upsert)

When to Use

Use when
Avoid when

Value Profile

Origin of ValueWhere it appearsHow it is captured
Future Cashflow Incremental index quality Each branch completion improves index fidelity. The fast engine provides immediate utility (2s latency). The deep engine corrects errors later. The handwriting engine adds coverage unavailable from the others. Total value is the sum of independent contributions.
Governance Index Result node Idempotency is a correctness constraint. The indexing logic must handle out-of-order arrivals gracefully — deep engine may occasionally beat fast engine under load. Without idempotent upsert logic, the governance of "index reflects latest best result" breaks.
Conditional Action Each branch independently N branches means N index invocations. Cost is proportional to N, not to a single synchronized join. The compute model is additive, not multiplicative — each engine runs independently and charges for its own execution.
Risk Exposure Index consistency window Between the first and last branch completion, the index is in a partially-enriched state. Queries during this window may return incomplete results. The risk is latency-bounded — the window closes when the slowest branch completes.
Contrast with AND-join and 20.23. Synchronization (AND-join / AND-join) waits for all N branches before triggering downstream once. Structured Discriminator (20.23) triggers downstream once on the first completion and ignores the rest. Multi-Merge triggers downstream N times, once per branch completion. Use Multi-Merge when each completion independently adds value to an accumulative target.

Dynamics and Failure Modes

Non-idempotent downstream corruption

The indexing function appends results rather than performing an upsert — each OCR result adds a new record instead of enriching the existing one. After three completions, the document has three separate index entries with conflicting extracted text. Downstream search queries return triplicated results. Fix: the Index Result node must implement idempotent upsert semantics keyed on (document_id, engine_id). An append-only design is incompatible with Multi-Merge.

Out-of-order enrichment overwrites better results

The deep engine (97% confidence) returns first due to document simplicity. The fast engine (85% confidence) returns 500ms later and overwrites the higher-quality result because the merge logic uses last-write-wins. Fix: the merge logic must compare confidence scores and retain the highest-confidence value per field, not apply the most recent write unconditionally.

Partial-result window exploitation

A downstream consumer reads the index 1 second after the fast engine completes — before the deep and handwriting engines return. The consumer sees a 85%-confidence partial extraction and makes a downstream decision based on incomplete data. The multi-merge has not failed, but its incremental enrichment model is invisible to the consumer. Fix: expose index completeness metadata alongside results — include which engines have completed and when the final enrichment is expected.

Variants

VariantModificationWhen to use
Threshold Multi-Merge Downstream fires only after at least K of N branches complete, then fires once for each subsequent completion A minimum evidence set is required before any processing is useful — partial results below K are noise, not signal
Weighted Multi-Merge Each branch result carries a confidence weight; downstream applies weighted merge rather than independent upsert Branch outputs are estimates of the same underlying value — a weighted average is more accurate than last-write-wins
Bounded Multi-Merge Multi-Merge within a structured block — all branches are guaranteed to eventually complete, enabling clean close-out Process lifecycle management requires knowing when enrichment is definitively complete — unbounded merge makes closure ambiguous

Related Patterns

PatternRelationship
40.41 Multi-Choice (OR-Split)Common upstream pairing — OR-split activates a variable subset of branches; Multi-Merge collects each completion independently.
40.43 Structured DiscriminatorContrast: fires downstream once on first completion. Use when only the fastest result matters and subsequent arrivals are discarded.
10.11 Pipeline (AND-join)Contrast: fires downstream once after all branches complete. Use when all results must be present before any processing.

Investment Signal

Multi-Merge is the architecture of incremental enrichment pipelines. The pattern is commercially significant in any domain where multiple data sources improve the same artifact over time: document intelligence, multi-model ensembles, multi-source data fusion.

The idempotency requirement is a hidden engineering cost. Teams underestimate how difficult it is to build downstream tasks that correctly handle multiple, out-of-order, partial invocations. Systems that appear to use Multi-Merge but have non-idempotent downstream code accumulate silent data corruption at scale.

Due diligence question: does the downstream indexing/processing logic have explicit tests for out-of-order multi-invocation? If not, the system works in demos (branches arrive in expected order, small scale) and fails in production (high load, variable latency, concurrent documents).