Search code examples
linuxbashslurm

How to limit the number of parallel executed programs in SLURM


I am trying to use slurm to run multiple commands in parallel on my cluster (single node). This is my situation:

  • I have N commands to run
  • I have M physical cores in my cluster (M=4)

Since every command requires a physical core, and M < N, I would like that, at most, only M commands are executed simultaneously.

The problem is that all the N commands are executed when I run sbatch command. I tried to use --ntasks parameter but with no success. Probably I am using the wrong SLURM parameters.

This is the file I am using:

############# file name: ./run_parallel_commands.sh #############
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --mem-per-cpu=1G

./command-1 &
./command-2 &
# ...
./command-N &
wait

And it is executed running:

$ sbatch ./run_parallel_commands.sh

Any suggestions? Thank you in advance.


Solution

  • You are almost there; the parameters you have should be fine. It's just the execution of the commands which needs work. You must execute the tasks using the srun command.

    #!/bin/bash
    #SBATCH --nodes=1
    #SBATCH --ntasks=4
    #SBATCH --mem-per-cpu=1G
    
    srun ./command-1 &
    srun ./command-2 &
    # ...
    srun ./command-N &
    wait