Search code examples
arraysdependenciesjobsslurm

Having a job depend on an array job in SLURM


I have two job scripts to submit to SLURM, jobA.sh and jobB.sh. jobA is a array job and I want jobB to only start once all of jobA has been completed. My script for jobA.sh is:

#!/bin/bash
#SBATCH -A TRIGWMS 
#SBATCH --mail-type=FAIL
# cores per task
#SBATCH -c 11
#
#SBATCH --array=%#combo#%%100
#SBATCH -J %#profile#%_%#freq#%
#
# number of nodes
#SBATCH -N 1
#
#SBATCH -t 0-2:00:00
# Standard output is saved in this file
#SBATCH -o myjob_%A_%a.out
#
# Standard error messages are saved in this file
#SBATCH -e myjob_%A_%a.err
#
# set the $OMP_NUM_THREADS variable
export OMP_NUM_THREADS=12
./myjobA_$SLURM_ARRAY_TASK_ID

This job script runs fine, but I cannot seem to get jobB to run after it has. jobB has the following script:

#!/bin/bash

#SBATCH -A TRIGWMS 
#SBATCH --mail-type=FAIL
# cores per task
#SBATCH -c 11
#
# number of nodes
#SBATCH -N 1
#SBATCH --ntasks=1

#SBATCH -J MESA
#SBATCH -t 0-2:00:00
# Standard output is saved in this file
#SBATCH -o myjob_%A_%a.out
#
# Standard error messages are saved in this file
#SBATCH -e myjob_%A_%a.err
#
# set the $OMP_NUM_THREADS variable
ompthreads=$SLURM_JOB_CPUS_PER_NODE
export OMP_NUM_THREADS=$ompthreads
./myjobB

This script also works fine, but only if jobA is ran first. To try and submit both of these jobs, with jobB dependent on jobA, I used the following script:

#!/bin/bash

FIRST=$(sbatch -p bigmem [email protected] jobA.sh)
echo $FIRST
SECOND=$(sbatch --dependency=afterany:$FIRST jobB.sh)
echo $SECOND

exit 0

but this only submits the first and comes with the error 'sbatch: error: Unable to open file batch' (I originally had -p bigmem --mail etc in there but took it out just to check). The issue is with the --dependency part and once I remove that all of them are submitted, but I need jobB to start after jobA has finished.


Solution

  • You should submit your first job with the --parsable option.

    FIRST=$(sbatch -p bigmem [email protected] --parsable jobA.sh)
    

    Otherwise, the FIRST variable contains a string similar to:

    Submitted batch job 123456789
    

    So your second line looks like this after variable expansion by Bash:

    SECOND=$(sbatch --dependency=afterany:Submitted batch job 123456789 jobB.sh)
    

    So sbatch is actually trying to find a script named batch and running it with arguments job 123456789 jobB.sh. With the --parsable option, sbatch will only respond with the job id and your line should work as is.

    If your cluster runs a version of Slurm that is too old, the --parsable option might not be available, in which case you can follow this advice.