Running tests in a scheduler#
Tests can be run under a workload manager (scheduler) such as Slurm or PBS by adding the following options to canary run:
canary run [-b spec=(duration:T|count:{max,auto,N})[,layout:{flat,atomic}][,nodes:{any,same}]] -b scheduler=SCHEDULER -b workers=N ...
When run in “batch” mode, canary will group tests into “batches” and submit each batch to SCHEDULER.
Batching options#
Batch spec#
if
duration:T: create batches with approximate run length ofTsecondsif
count:max: one test per batchif
count:auto: auto batch depending on other optionsif
count:N: create at mostNbatchesif
layout:flat: batches have no intra-batch dependencies but may have inter-batch dependenciesif
layout:atomic: batch have no inter-batch dependencies but may have intra-batch dependenciesif
nodes:any: tests are batched with respect to node count of test casesif
nodes:same: tests are batched with tests having the same node count
The default batch spec is duration:30m,nodes:any,layout:flat.
Note
-b spec=count:N and -b spec=duration:T are mutually exclusive.
Batch scheduler#
-b scheduler=S: use schedulerSto run batches.-b option=option: pass option to the scheduler. If option contains commas, it is split into multiple options at the commas. Eg,-b option="-q debug,-A ABC123"passes-q debugand-ABC123directly to the scheduler.
The following schedulers are currently supported:
Note
The shell scheduler is not performant and its primary utility is running examples on machines which don’t have an actual batch scheduler setup.
Batch concurrency#
Batch concurrency can be controlled by
--workers=N: SubmitNconcurrent batches to the scheduler at any one time. The default is 5.-b workers=N: Execute the batch asynchronously using a pool of at mostNworkers. By default, the maximum number of available workers is used.
Examples#
Run the canary example suite in 4 batches
$ canary run --workers=1 -b scheduler=shell -b spec=count:4 . INFO: Initializing empty canary workspace at . INFO: Collecting generator files from . INFO: Instantiating generators from collected files INFO: Generating test specs from generators WARNING: cmake not found, jobs cannot be generated INFO: Searching for duplicated tests INFO: Resolving test spec dependencies INFO: Generated 81 test specs from 38 generators INFO: Excluded 1 test spec during generation Reason Count ────────────────────────────────────────────────────────── options=enable evaluated to False for options=[] 1 INFO: Caching test specs INFO: Created selection 'aqua-crystal' INFO: Selecting test jobs based on runtime environment INFO: Excluded 13 test jobs Reason Count ──────────────────────────────────── insufficient slots of cpus 10 Resource unavailable: gpus 3 INFO: Starting session 2026-06-04T20-44-08.214714 INFO: Generated 4 test batches from 67 jobs INFO: Starting process pool with max 1 workers Job ID Status Elapsed Rank ────────────────────────────────────────────────────────────────────────────────────────────────── TestBatch(id=5f9a152) 5f9a152 SUBMITTED 1/4 TestBatch(id=5f9a152) 5f9a152 STARTED 1/4 TestBatch(id=5f9a152… 5f9a152 FAIL (21 SUCCESS, 2 XFAIL, 1 SKIPPED, 1 DIFF… 15.4s 1/4 TestBatch(id=275bcd4) 275bcd4 SUBMITTED 2/4 TestBatch(id=275bcd4) 275bcd4 STARTED 2/4 TestBatch(id=275bcd4… 275bcd4 FAIL (23 SUCCESS, 1 XDIFF, 2 FAILED, 1 TIMEO… 14.1s 2/4 TestBatch(id=3265f07) 3265f07 SUBMITTED 3/4 TestBatch(id=3265f07) 3265f07 STARTED 3/4 TestBatch(id=3265f07… 3265f07 PASS (3 SUCCESS) 3.0s 3/4 TestBatch(id=18fd576) 18fd576 SUBMITTED 4/4 TestBatch(id=18fd576) 18fd576 STARTED 4/4 TestBatch(id=18fd576… 18fd576 PASS (9 SUCCESS) 6.1s 4/4 ┌────────────┬─────────┬────────────────┬─────────┬─────────────────────────────────────────────┐ │ Job │ ID │ Status │ Elapsed │ Details │ ├────────────┼─────────┼────────────────┼─────────┼─────────────────────────────────────────────┤ │ diff │ 8e033df │ FAIL (DIFFED) │ 0.3s │ Test exited with diff exit code = 64 │ │ skip │ e0e106b │ SKIP (SKIPPED) │ 0.3s │ Test exited with skip exit code = 80 │ │ timeout │ 3afa81a │ FAIL (TIMEOUT) │ 2.2s │ Job timed out after 2.0 s. │ │ xdiff-fail │ 7c452d5 │ FAIL (FAILED) │ 0.3s │ xdiff-fail: expected test to diff │ │ willfail │ cd68ac3 │ FAIL (FAILED) │ 0.1s │ Test exited with exit code = 1 │ │ fail │ de70161 │ FAIL (FAILED) │ 0.3s │ Test exited with exit code = 65 │ │ timeout │ c11972b │ FAIL (TIMEOUT) │ 2.2s │ Job timed out after 2.0 s. │ │ xfail-fail │ 327c2f3 │ FAIL (FAILED) │ 0.3s │ xfail-fail: expected to exit with code != 0 │ └────────────┴─────────┴────────────────┴─────────┴─────────────────────────────────────────────┘ 67/67 COMPLETE, 56 SUCCESS, 1 XDIFF, 2 XFAIL, 1 SKIPPED, 1 DIFFED, 4 FAILED, 2 TIMEOUT, in 00:00:38 INFO: Finished session in 38.75 s. with returncode 14 INFO: Updating view at /home/docs/checkouts/readthedocs.org/user_builds/canary-wm/checkouts/latest/src/canary/examples/TestResults
Run the canary example suite in 4 batches, running tests in serial in each batch
$ canary run --workers=1 -b scheduler=shell -b spec=count:4 -b workers=1 . INFO: Initializing empty canary workspace at . INFO: Collecting generator files from . INFO: Instantiating generators from collected files INFO: Generating test specs from generators WARNING: cmake not found, jobs cannot be generated INFO: Searching for duplicated tests INFO: Resolving test spec dependencies INFO: Generated 81 test specs from 38 generators INFO: Excluded 1 test spec during generation Reason Count ────────────────────────────────────────────────────────── options=enable evaluated to False for options=[] 1 INFO: Caching test specs INFO: Created selection 'ivory-swan' INFO: Selecting test jobs based on runtime environment INFO: Excluded 13 test jobs Reason Count ──────────────────────────────────── insufficient slots of cpus 10 Resource unavailable: gpus 3 INFO: Starting session 2026-06-04T20-44-48.148579 INFO: Generated 4 test batches from 67 jobs INFO: Starting process pool with max 1 workers Job ID Status Elapsed Rank ────────────────────────────────────────────────────────────────────────────────────────────────── TestBatch(id=d65b232) d65b232 SUBMITTED 1/4 TestBatch(id=d65b232) d65b232 STARTED 1/4 TestBatch(id=d65b232… d65b232 FAIL (21 SUCCESS, 1 XFAIL, 1 DIFFED, 4 FAILE… 15.4s 1/4 TestBatch(id=d4ac7a2) d4ac7a2 SUBMITTED 2/4 TestBatch(id=d4ac7a2) d4ac7a2 STARTED 2/4 TestBatch(id=d4ac7a2… d4ac7a2 FAIL (23 SUCCESS, 1 XDIFF, 1 XFAIL, 1 SKIPPE… 14.1s 2/4 TestBatch(id=24f747a) 24f747a SUBMITTED 3/4 TestBatch(id=24f747a) 24f747a STARTED 3/4 TestBatch(id=24f747a… 24f747a PASS (3 SUCCESS) 3.0s 3/4 TestBatch(id=45a269f) 45a269f SUBMITTED 4/4 TestBatch(id=45a269f) 45a269f STARTED 4/4 TestBatch(id=45a269f… 45a269f PASS (9 SUCCESS) 6.1s 4/4 ┌────────────┬─────────┬────────────────┬─────────┬─────────────────────────────────────────────┐ │ Job │ ID │ Status │ Elapsed │ Details │ ├────────────┼─────────┼────────────────┼─────────┼─────────────────────────────────────────────┤ │ diff │ 8e033df │ FAIL (DIFFED) │ 0.3s │ Test exited with diff exit code = 64 │ │ fail │ de70161 │ FAIL (FAILED) │ 0.3s │ Test exited with exit code = 65 │ │ timeout │ c11972b │ FAIL (TIMEOUT) │ 2.2s │ Job timed out after 2.0 s. │ │ xdiff-fail │ 7c452d5 │ FAIL (FAILED) │ 0.3s │ xdiff-fail: expected test to diff │ │ xfail-fail │ 327c2f3 │ FAIL (FAILED) │ 0.3s │ xfail-fail: expected to exit with code != 0 │ │ willfail │ cd68ac3 │ FAIL (FAILED) │ 0.1s │ Test exited with exit code = 1 │ │ skip │ e0e106b │ SKIP (SKIPPED) │ 0.3s │ Test exited with skip exit code = 80 │ │ timeout │ 3afa81a │ FAIL (TIMEOUT) │ 2.2s │ Job timed out after 2.0 s. │ └────────────┴─────────┴────────────────┴─────────┴─────────────────────────────────────────────┘ 67/67 COMPLETE, 56 SUCCESS, 1 XDIFF, 2 XFAIL, 1 SKIPPED, 1 DIFFED, 4 FAILED, 2 TIMEOUT, in 00:00:38 INFO: Finished session in 38.74 s. with returncode 14 INFO: Updating view at /home/docs/checkouts/readthedocs.org/user_builds/canary-wm/checkouts/latest/src/canary/examples/TestResults