Search code examples
hpcslurm

How to run two multiprocessing programs in one batch using SLURM?


I have SLURM cluster with several nodes with 16 vcpus per node. I've tried to run the following code:

#SBATCH --nodes 2
#SBATCH --ntasks 2
#SBATCH -c 16

srun --exclusive --nodes=1 program1 &
srun --exclusive --nodes=1 program2 &
wait

program1 and program2 needs 16cpus each and I expected that 2 nodes with 32 cores would be allocated and program1 would be ran on the first node and program2 on the second one, but I got the following error message:

srun: error: Unable to create step for job 364966: Requested node configuration is not available

If I use only --nodes and --ntasks keys, sbatch allocates 2 nodes with 2 cpus and if I use --nodes and -c options, I get message that --ntasks should be defined.

If I set --ntasks=1, SLURM set nnodes to 1.

How can I run this two programs in one batch, each on one node and 16 vcpus?


Solution

  • The following seems working:

    #SBATCH --nodes 2
    #SBATCH --exclusive
    
    srun --exclusive --nodes=1 -c 16 --ntasks 1 prog1 &
    srun --exclusive --nodes=1 -c 16 --ntasks 1 prog2 &
    wait