Running tests in a scheduler#

Tests can be run under a workload manager (scheduler) such as Slurm or PBS by adding the following options to canary run:

canary run [-b spec=(duration:T|count:{max,auto,N})[,layout:{flat,atomic}][,nodes:{any,same}]] -b scheduler=SCHEDULER -b workers=N ...

When run in “batch” mode, canary will group tests into “batches” and submit each batch to SCHEDULER.

Batching options#

Batch spec#

  • if duration:T: create batches with approximate run length of T seconds

  • if count:max: one test per batch

  • if count:auto: auto batch depending on other options

  • if count:N: create at most N batches

  • if layout:flat: batches have no intra-batch dependencies but may have inter-batch dependencies

  • if layout:atomic: batch have no inter-batch dependencies but may have intra-batch dependencies

  • if nodes:any: tests are batched with respect to node count of test cases

  • if nodes:same: tests are batched with tests having the same node count

The default batch spec is duration:30m,nodes:any,layout:flat.

Note

-b spec=count:N and -b spec=duration:T are mutually exclusive.

Batch scheduler#

  • -b scheduler=S: use scheduler S to run batches.

  • -b option=option: pass option to the scheduler. If option contains commas, it is split into multiple options at the commas. Eg, -b option="-q debug,-A ABC123" passes -q debug and -ABC123 directly to the scheduler.

The following schedulers are currently supported:

  • shell (run batches in subprocess of the current shell)

  • slurm

  • flux

  • PBS

Note

The shell scheduler is not performant and its primary utility is running examples on machines which don’t have an actual batch scheduler setup.

Batch concurrency#

Batch concurrency can be controlled by

  • --workers=N: Submit N concurrent batches to the scheduler at any one time. The default is 5.

  • -b workers=N: Execute the batch asynchronously using a pool of at most N workers. By default, the maximum number of available workers is used.

Examples#

  • Run the canary example suite in 4 batches

    $ canary run --workers=1 -b scheduler=shell -b spec=count:4 .
    INFO: Initializing empty canary workspace at .
    INFO: Collecting generator files from .
    INFO: Instantiating generators from collected files
    INFO: Generating test specs from generators
    WARNING: cmake not found, jobs cannot be generated
    INFO: Searching for duplicated tests
    INFO: Resolving test spec dependencies
    INFO: Generated 81 test specs from 38 generators
    INFO: Excluded 1 test spec during generation
                                                                
      Reason                                             Count  
     ────────────────────────────────────────────────────────── 
      options=enable evaluated to False for options=[]       1  
                                                                
    INFO: Caching test specs
    INFO: Created selection 'aqua-crystal'
    INFO: Selecting test jobs based on runtime environment
    INFO: Excluded 13 test jobs
                                          
      Reason                       Count  
     ──────────────────────────────────── 
      insufficient slots of cpus      10  
      Resource unavailable: gpus       3  
                                          
    INFO: Starting session 2026-06-04T20-44-08.214714
    INFO: Generated 4 test batches from 67 jobs
    INFO: Starting process pool with max 1 workers
    Job                    ID        Status                                          Elapsed      Rank  
    ──────────────────────────────────────────────────────────────────────────────────────────────────
    TestBatch(id=5f9a152)  5f9a152   SUBMITTED                                                     1/4  
    TestBatch(id=5f9a152)  5f9a152   STARTED                                                       1/4  
    TestBatch(id=5f9a152…  5f9a152   FAIL (21 SUCCESS, 2 XFAIL, 1 SKIPPED, 1 DIFF…     15.4s       1/4  
    TestBatch(id=275bcd4)  275bcd4   SUBMITTED                                                     2/4  
    TestBatch(id=275bcd4)  275bcd4   STARTED                                                       2/4  
    TestBatch(id=275bcd4…  275bcd4   FAIL (23 SUCCESS, 1 XDIFF, 2 FAILED, 1 TIMEO…     14.1s       2/4  
    TestBatch(id=3265f07)  3265f07   SUBMITTED                                                     3/4  
    TestBatch(id=3265f07)  3265f07   STARTED                                                       3/4  
    TestBatch(id=3265f07…  3265f07   PASS (3 SUCCESS)                                   3.0s       3/4  
    TestBatch(id=18fd576)  18fd576   SUBMITTED                                                     4/4  
    TestBatch(id=18fd576)  18fd576   STARTED                                                       4/4  
    TestBatch(id=18fd576…  18fd576   PASS (9 SUCCESS)                                   6.1s       4/4  
    ┌────────────┬─────────┬────────────────┬─────────┬─────────────────────────────────────────────┐
    │ Job        │ ID      │ Status         │ Elapsed │ Details                                     │
    ├────────────┼─────────┼────────────────┼─────────┼─────────────────────────────────────────────┤
    │ diff       │ 8e033df │ FAIL (DIFFED)  │    0.3s │ Test exited with diff exit code = 64        │
    │ skip       │ e0e106b │ SKIP (SKIPPED) │    0.3s │ Test exited with skip exit code = 80        │
    │ timeout    │ 3afa81a │ FAIL (TIMEOUT) │    2.2s │ Job timed out after 2.0 s.                  │
    │ xdiff-fail │ 7c452d5 │ FAIL (FAILED)  │    0.3s │ xdiff-fail: expected test to diff           │
    │ willfail   │ cd68ac3 │ FAIL (FAILED)  │    0.1s │ Test exited with exit code = 1              │
    │ fail       │ de70161 │ FAIL (FAILED)  │    0.3s │ Test exited with exit code = 65             │
    │ timeout    │ c11972b │ FAIL (TIMEOUT) │    2.2s │ Job timed out after 2.0 s.                  │
    │ xfail-fail │ 327c2f3 │ FAIL (FAILED)  │    0.3s │ xfail-fail: expected to exit with code != 0 │
    └────────────┴─────────┴────────────────┴─────────┴─────────────────────────────────────────────┘
     67/67 COMPLETE, 56 SUCCESS, 1 XDIFF, 2 XFAIL, 1 SKIPPED, 1 DIFFED, 4 FAILED, 2 TIMEOUT, in 00:00:38                    
    INFO: Finished session in 38.75 s. with returncode 14
    INFO: Updating view at /home/docs/checkouts/readthedocs.org/user_builds/canary-wm/checkouts/latest/src/canary/examples/TestResults
    
  • Run the canary example suite in 4 batches, running tests in serial in each batch

    $ canary run --workers=1 -b scheduler=shell -b spec=count:4 -b workers=1 .
    INFO: Initializing empty canary workspace at .
    INFO: Collecting generator files from .
    INFO: Instantiating generators from collected files
    INFO: Generating test specs from generators
    WARNING: cmake not found, jobs cannot be generated
    INFO: Searching for duplicated tests
    INFO: Resolving test spec dependencies
    INFO: Generated 81 test specs from 38 generators
    INFO: Excluded 1 test spec during generation
                                                                
      Reason                                             Count  
     ────────────────────────────────────────────────────────── 
      options=enable evaluated to False for options=[]       1  
                                                                
    INFO: Caching test specs
    INFO: Created selection 'ivory-swan'
    INFO: Selecting test jobs based on runtime environment
    INFO: Excluded 13 test jobs
                                          
      Reason                       Count  
     ──────────────────────────────────── 
      insufficient slots of cpus      10  
      Resource unavailable: gpus       3  
                                          
    INFO: Starting session 2026-06-04T20-44-48.148579
    INFO: Generated 4 test batches from 67 jobs
    INFO: Starting process pool with max 1 workers
    Job                    ID        Status                                          Elapsed      Rank  
    ──────────────────────────────────────────────────────────────────────────────────────────────────
    TestBatch(id=d65b232)  d65b232   SUBMITTED                                                     1/4  
    TestBatch(id=d65b232)  d65b232   STARTED                                                       1/4  
    TestBatch(id=d65b232…  d65b232   FAIL (21 SUCCESS, 1 XFAIL, 1 DIFFED, 4 FAILE…     15.4s       1/4  
    TestBatch(id=d4ac7a2)  d4ac7a2   SUBMITTED                                                     2/4  
    TestBatch(id=d4ac7a2)  d4ac7a2   STARTED                                                       2/4  
    TestBatch(id=d4ac7a2…  d4ac7a2   FAIL (23 SUCCESS, 1 XDIFF, 1 XFAIL, 1 SKIPPE…     14.1s       2/4  
    TestBatch(id=24f747a)  24f747a   SUBMITTED                                                     3/4  
    TestBatch(id=24f747a)  24f747a   STARTED                                                       3/4  
    TestBatch(id=24f747a…  24f747a   PASS (3 SUCCESS)                                   3.0s       3/4  
    TestBatch(id=45a269f)  45a269f   SUBMITTED                                                     4/4  
    TestBatch(id=45a269f)  45a269f   STARTED                                                       4/4  
    TestBatch(id=45a269f…  45a269f   PASS (9 SUCCESS)                                   6.1s       4/4  
    ┌────────────┬─────────┬────────────────┬─────────┬─────────────────────────────────────────────┐
    │ Job        │ ID      │ Status         │ Elapsed │ Details                                     │
    ├────────────┼─────────┼────────────────┼─────────┼─────────────────────────────────────────────┤
    │ diff       │ 8e033df │ FAIL (DIFFED)  │    0.3s │ Test exited with diff exit code = 64        │
    │ fail       │ de70161 │ FAIL (FAILED)  │    0.3s │ Test exited with exit code = 65             │
    │ timeout    │ c11972b │ FAIL (TIMEOUT) │    2.2s │ Job timed out after 2.0 s.                  │
    │ xdiff-fail │ 7c452d5 │ FAIL (FAILED)  │    0.3s │ xdiff-fail: expected test to diff           │
    │ xfail-fail │ 327c2f3 │ FAIL (FAILED)  │    0.3s │ xfail-fail: expected to exit with code != 0 │
    │ willfail   │ cd68ac3 │ FAIL (FAILED)  │    0.1s │ Test exited with exit code = 1              │
    │ skip       │ e0e106b │ SKIP (SKIPPED) │    0.3s │ Test exited with skip exit code = 80        │
    │ timeout    │ 3afa81a │ FAIL (TIMEOUT) │    2.2s │ Job timed out after 2.0 s.                  │
    └────────────┴─────────┴────────────────┴─────────┴─────────────────────────────────────────────┘
     67/67 COMPLETE, 56 SUCCESS, 1 XDIFF, 2 XFAIL, 1 SKIPPED, 1 DIFFED, 4 FAILED, 2 TIMEOUT, in 00:00:38                    
    INFO: Finished session in 38.74 s. with returncode 14
    INFO: Updating view at /home/docs/checkouts/readthedocs.org/user_builds/canary-wm/checkouts/latest/src/canary/examples/TestResults