Running tests in a scheduler#

Tests can be run under a workload manager (scheduler) such as Slurm or PBS by adding the following options to canary run:

canary run [-b spec=(duration:T|count:{max,auto,N})[,layout:{flat,atomic}][,nodes:{any,same}]] -b scheduler=SCHEDULER -b workers=N ...

When run in “batch” mode, canary will group tests into “batches” and submit each batch to SCHEDULER.

Batching options#

Batch spec#

  • if duration:T: create batches with approximate run length of T seconds

  • if count:max: one test per batch

  • if count:auto: auto batch depending on other options

  • if count:N: create at most N batches

  • if layout:flat: batches have no intra-batch dependencies but may have inter-batch dependencies

  • if layout:atomic: batch have no inter-batch dependencies but may have intra-batch dependencies

  • if nodes:any: tests are batched with respect to node count of test cases

  • if nodes:same: tests are batched with tests having the same node count

The default batch spec is duration:30m,nodes:any,layout:flat.

Note

-b spec=count:N and -b spec=duration:T are mutually exclusive.

Batch scheduler#

  • -b scheduler=S: use scheduler S to run batches.

  • -b option=option: pass option to the scheduler. If option contains commas, it is split into multiple options at the commas. Eg, -b option="-q debug,-A ABC123" passes -q debug and -ABC123 directly to the scheduler.

The following schedulers are currently supported:

  • shell (run batches in subprocess of the current shell)

  • slurm

  • flux

  • PBS

Note

The shell scheduler is not performant and its primary utility is running examples on machines which don’t have an actual batch scheduler setup.

Batch concurrency#

Batch concurrency can be controlled by

  • --workers=N: Submit N concurrent batches to the scheduler at any one time. The default is 5.

  • -b workers=N: Execute the batch asynchronously using a pool of at most N workers. By default, the maximum number of available workers is used.

Examples#

  • Run the canary example suite in 4 batches

    $ canary run --workers=1 -b scheduler=shell -b spec=count:4 .
    INFO: Initializing empty canary workspace at .
    INFO: Collecting generator files from .
    INFO: Instantiating generators from collected files
    INFO: Generating test specs from generators
    WARNING: cmake not found, test cases cannot be generated
    INFO: Searching for duplicated tests
    INFO: Resolving test spec dependencies
    INFO: Generated 81 test specs from 38 generators
    INFO: Excluded 1 test spec during generation
                                                                
      Reason                                             Count  
     ────────────────────────────────────────────────────────── 
      options=enable evaluated to False for options=[]       1  
                                                                
    INFO: Caching test specs
    INFO: Created selection 'silent-gorge'
    INFO: Selecting test cases based on runtime environment
    INFO: Excluded 13 test cases
                                          
      Reason                       Count  
     ──────────────────────────────────── 
      insufficient slots of cpus      10  
      Resource unavailable: gpus       3  
                                          
    INFO: Starting session 2026-04-21T15-33-20.458951
    INFO: Generated 4 test batches from 67 test cases
    INFO: Starting process pool with max 1 workers
    Job                    ID        Status                                           Queued   Elapsed      Rank  
    ────────────────────────────────────────────────────────────────────────────────────────────────────────────
    TestBatch(id=d8d260c)  d8d260c   SUBMITTED                                                               1/4  
    TestBatch(id=d8d260c)  d8d260c   STARTED                                            1.3s                 1/4  
    TestBatch(id=d8d260c)  d8d260c   NONE (16 PASS, 1 SKIPPED, 1 FAILED)                1.3s      4.3s       1/4  
    TestBatch(id=7135212)  7135212   SUBMITTED                                                               2/4  
    TestBatch(id=7135212)  7135212   STARTED                                            1.0s                 2/4  
    TestBatch(id=7135212)  7135212   NONE (18 PASS, 2 TIMEOUT)                          1.0s      6.0s       2/4  
    TestBatch(id=b05eef3)  b05eef3   SUBMITTED                                                               3/4  
    TestBatch(id=b05eef3)  b05eef3   STARTED                                            1.0s                 3/4  
    TestBatch(id=b05eef3)  b05eef3   NONE (16 PASS, 1 DIFFED, 3 FAILED)                 1.0s      4.0s       3/4  
    TestBatch(id=f3ca50c)  f3ca50c   SUBMITTED                                                               4/4  
    TestBatch(id=f3ca50c)  f3ca50c   STARTED                                            1.0s                 4/4  
    TestBatch(id=f3ca50c)  f3ca50c   NONE (9 PASS)                                      1.0s      3.0s       4/4  
    ┌──────────────┬────────────┬────────────────┬───────────┬───────────┬─────────────────────────────────────────────────┐
    │ Job          │ ID         │ Status         │    Queued │   Elapsed │ Details                                         │
    ├──────────────┼────────────┼────────────────┼───────────┼───────────┼─────────────────────────────────────────────────┤
    │ skip         │ e01f382    │ SKIP (SKIPPED) │      0.0s │      0.2s │ Test exited with skip exit code = 80            │
    │ xdiff-fail   │ f4263d4    │ FAIL (FAILED)  │      0.0s │      0.2s │ xdiff-fail: expected test to diff               │
    │ timeout      │ 1c2f507    │ FAIL (TIMEOUT) │      0.0s │      2.2s │ Job timed out after 2.0 s.                      │
    │ timeout      │ 7127eb6    │ FAIL (TIMEOUT) │      0.0s │      2.2s │ Job timed out after 2.0 s.                      │
    │ diff         │ b4597e3    │ FAIL (DIFFED)  │      0.0s │      0.2s │ Test exited with diff exit code = 64            │
    │ fail         │ a850e81    │ FAIL (FAILED)  │      0.0s │      0.2s │ Test exited with exit code = 65                 │
    │ xfail-fail   │ 9db7d1b    │ FAIL (FAILED)  │      0.0s │      0.2s │ xfail-fail: expected to exit with code != 0     │
    │ willfail     │ e4caa24    │ FAIL (FAILED)  │      0.0s │      0.1s │ Test exited with exit code = 1                  │
    └──────────────┴────────────┴────────────────┴───────────┴───────────┴─────────────────────────────────────────────────┘
     67/67 COMPLETE, 56 SUCCESS, 1 XDIFF, 2 XFAIL, 1 SKIPPED, 1 DIFFED, 4 FAILED, 2 TIMEOUT, in 00:00:17                    
    INFO: Finished session in 17.49 s. with returncode 14
    INFO: Updating view at /home/docs/checkouts/readthedocs.org/user_builds/canary-wm/checkouts/release-26.4.16/src/canary/examples/TestResults
    
  • Run the canary example suite in 4 batches, running tests in serial in each batch

    $ canary run --workers=1 -b scheduler=shell -b spec=count:4 -b workers=1 .
    INFO: Initializing empty canary workspace at .
    INFO: Collecting generator files from .
    INFO: Instantiating generators from collected files
    INFO: Generating test specs from generators
    WARNING: cmake not found, test cases cannot be generated
    INFO: Searching for duplicated tests
    INFO: Resolving test spec dependencies
    INFO: Generated 81 test specs from 38 generators
    INFO: Excluded 1 test spec during generation
                                                                
      Reason                                             Count  
     ────────────────────────────────────────────────────────── 
      options=enable evaluated to False for options=[]       1  
                                                                
    INFO: Caching test specs
    INFO: Created selection 'solar-aurora'
    INFO: Selecting test cases based on runtime environment
    INFO: Excluded 13 test cases
                                          
      Reason                       Count  
     ──────────────────────────────────── 
      insufficient slots of cpus      10  
      Resource unavailable: gpus       3  
                                          
    INFO: Starting session 2026-04-21T15-33-38.772597
    INFO: Generated 4 test batches from 67 test cases
    INFO: Starting process pool with max 1 workers
    Job                    ID        Status                                           Queued   Elapsed      Rank  
    ────────────────────────────────────────────────────────────────────────────────────────────────────────────
    TestBatch(id=5026185)  5026185   SUBMITTED                                                               1/4  
    TestBatch(id=5026185)  5026185   STARTED                                            1.3s                 1/4  
    TestBatch(id=5026185)  5026185   NONE (17 PASS, 1 SKIPPED)                          1.3s      6.3s       1/4  
    TestBatch(id=d8f9169)  d8f9169   SUBMITTED                                                               2/4  
    TestBatch(id=d8f9169)  d8f9169   STARTED                                            1.0s                 2/4  
    TestBatch(id=d8f9169)  d8f9169   NONE (16 PASS, 2 FAILED, 2 TIMEOUT)                1.0s     10.0s       2/4  
    TestBatch(id=a15f049)  a15f049   SUBMITTED                                                               3/4  
    TestBatch(id=a15f049)  a15f049   STARTED                                            1.0s                 3/4  
    TestBatch(id=a15f049)  a15f049   NONE (17 PASS, 1 DIFFED, 2 FAILED)                 1.0s      6.0s       3/4  
    TestBatch(id=3f0d83e)  3f0d83e   SUBMITTED                                                               4/4  
    TestBatch(id=3f0d83e)  3f0d83e   STARTED                                            1.0s                 4/4  
    TestBatch(id=3f0d83e)  3f0d83e   NONE (9 PASS)                                      1.0s      3.0s       4/4  
    ┌──────────────┬────────────┬────────────────┬───────────┬───────────┬─────────────────────────────────────────────────┐
    │ Job          │ ID         │ Status         │    Queued │   Elapsed │ Details                                         │
    ├──────────────┼────────────┼────────────────┼───────────┼───────────┼─────────────────────────────────────────────────┤
    │ skip         │ e01f382    │ SKIP (SKIPPED) │      0.0s │      0.2s │ Test exited with skip exit code = 80            │
    │ timeout      │ 7127eb6    │ FAIL (TIMEOUT) │      0.0s │      2.2s │ Job timed out after 2.0 s.                      │
    │ timeout      │ 1c2f507    │ FAIL (TIMEOUT) │      0.0s │      2.2s │ Job timed out after 2.0 s.                      │
    │ xdiff-fail   │ f4263d4    │ FAIL (FAILED)  │      0.0s │      0.2s │ xdiff-fail: expected test to diff               │
    │ xfail-fail   │ 9db7d1b    │ FAIL (FAILED)  │      0.0s │      0.2s │ xfail-fail: expected to exit with code != 0     │
    │ diff         │ b4597e3    │ FAIL (DIFFED)  │      0.0s │      0.2s │ Test exited with diff exit code = 64            │
    │ fail         │ a850e81    │ FAIL (FAILED)  │      0.0s │      0.2s │ Test exited with exit code = 65                 │
    │ willfail     │ e4caa24    │ FAIL (FAILED)  │      0.0s │      0.1s │ Test exited with exit code = 1                  │
    └──────────────┴────────────┴────────────────┴───────────┴───────────┴─────────────────────────────────────────────────┘
     67/67 COMPLETE, 56 SUCCESS, 1 XDIFF, 2 XFAIL, 1 SKIPPED, 1 DIFFED, 4 FAILED, 2 TIMEOUT, in 00:00:25                    
    INFO: Finished session in 25.50 s. with returncode 14
    INFO: Updating view at /home/docs/checkouts/readthedocs.org/user_builds/canary-wm/checkouts/release-26.4.16/src/canary/examples/TestResults