20.30 Interleaved Parallel Routing

A set of tasks must all execute exactly once, in any order, but never concurrently — at most one is active at a time. No predefined sequence; execution order emerges from resource availability at runtime.

Motivating Scenario

A pharmaceutical AI system must run three checks on each candidate compound: Solubility Check, Toxicity Screen, and Patent Search. Each check uses a different specialized model — one for cheminformatics, one for biomedical NLP, one for legal document retrieval. All three models share a single GPU with 80GB VRAM. Only one model can be loaded and running at a time.

The key insight: the three tasks are logically parallel (they are independent of each other's results) but physically serial (the shared GPU resource enforces mutual exclusion). The order in which they execute does not matter — any permutation produces the same compound report. What matters is that all three complete exactly once. A simple sequential pipeline would fix an arbitrary order; Interleaved Parallel Routing lets the scheduler choose order based on actual resource state.

Structure

Zoom and pan enabled · Concrete example: Pharmaceutical compound analysis (interleaved execution — one model active at a time)

Key Metrics

Metric	Signal
GPU utilization rate	Fraction of time GPU is actively computing vs. idle between tasks — high idle time signals scheduling inefficiency
Per-task wait time	Time each task waits before acquiring the GPU lock — monitors starvation risk across task types
Execution order entropy	Variance in execution order across runs — low entropy suggests the scheduler has a de-facto fixed order despite the interleaving design
Lock acquisition failures	Count of timeout or deadlock events per cycle — primary reliability signal for the resource coordination layer

Execution note: The diagram shows the logical topology (all three tasks enabled from start, merging into a compound report). At runtime, a resource-aware scheduler enforces mutual exclusion — only one of the three tasks runs at any given moment. The order varies per execution based on GPU availability. This is interleaved execution, not true parallelism.

Node	What it does	What it receives	What it produces
Solubility Check	Runs cheminformatics model to predict aqueous solubility. Loads the solubility model onto GPU, runs inference, unloads. Result is independent of other checks.	Compound SMILES string	Solubility score + confidence interval
Toxicity Screen	Runs biomedical NLP model against toxicity databases. Loads toxicity model onto GPU, screens compound, unloads.	Compound structure + toxicity database	Toxicity flags + risk classification
Patent Search	Runs legal document retrieval model against patent databases. Identifies prior art, freedom-to-operate risks.	Compound structure + patent database	Patent hits + FTO risk score
Compound Report	AND join: waits for all three checks to complete (in whatever order they ran). Merges results into a structured compound evaluation report.	Solubility score + toxicity flags + patent hits	Compound report: viability assessment

When to Use

Use when

Tasks are logically independent but share a mutually exclusive physical resource
All tasks must complete; order does not affect results
Resource contention is the primary execution constraint (GPU, API rate limit, license seat)
A fixed sequential order would be arbitrary and wasteful
Task set is known at design time

Avoid when

Tasks are truly independent and resources allow — use true parallelism (30.35)
Tasks have data dependencies — order matters and is not interchangeable
Resource contention is resolved externally (e.g., cloud auto-scaling) — no need for interleaving logic
Some tasks can be skipped — use OR routing instead

Value Profile

Origin of Value	Where it appears	How it is captured
Future Cashflow	Compound Report quality	All three dimensions always evaluated — no dimension is skipped due to ordering accidents. Report quality is invariant to execution order.
Conditional Action	GPU resource allocation	The scheduler optimizes GPU utilization by choosing which task to run next based on current load, model warm-up state, or priority. A fixed sequential order cannot exploit these signals.
Risk Exposure	Resource deadlock	If the scheduler holds the GPU lock while waiting for an external API (e.g., patent database), other tasks starve. Locks must be held only during GPU computation, not during I/O waits.

Interleaved vs. parallel. Interleaved parallel routing is parallel in intent but serial in execution. The workflow model expresses no ordering constraint. The execution engine enforces mutual exclusion via a resource token. This separation — between workflow logic and execution policy — is the defining characteristic of the pattern.

Dynamics and Failure Modes

Priority inversion

A low-priority task (Patent Search) holds the GPU lock when a high-priority compound enters the queue. The high-priority compound must wait for the low-priority task to release the resource. Fix: implement preemptive scheduling with checkpointing — allow a higher-priority task to interrupt a lower-priority one, save state, and resume the lower-priority task after the high-priority work completes.

One task never scheduled

The scheduler consistently picks Solubility Check and Toxicity Screen ahead of Patent Search because they complete faster and the GPU is never free long enough for Patent Search's longer startup time. The AND join never receives the Patent Search result. Fix: implement starvation detection — any task waiting beyond a maximum wait time is promoted to highest priority.

Resource lock leak on task failure

The Toxicity Screen model crashes mid-inference and exits without releasing the GPU lock. All other tasks wait indefinitely. Fix: GPU lock acquisition must use a try/finally pattern — the lock is released unconditionally on task exit, whether successful or not.

Variants

Variant	Modification	When to use
Priority-Weighted Interleave	Scheduler assigns weights to tasks; higher-weight tasks get resource preference	Some checks are more consequential — toxicity findings should preempt patent search when GPU becomes available
Batched Interleave	Multiple compounds queued; scheduler batches same-model tasks across compounds	Model load time dominates inference time; batching same-model work across compounds amortizes load cost
Partial-Order Interleave	Some tasks have a required predecessor; others remain freely orderable	One check requires a prior result as input but the remaining two are free — partial ordering with an interleave over the free subset

Related Patterns

Pattern	Relationship
60.62 MI Design-Time	True parallelism variant — use when the resource constraint is removed and tasks can execute simultaneously
10.11 Pipeline	Fixed sequential variant — use when order matters and tasks are not interchangeable
20.23 Orchestrator-Workers	Dynamic task ordering by an orchestrator — use when the task set itself is not fixed at design time