Search code examples
parallel-processingslurmsbatch

slurm seems to be launching more tasks than requested


I'm having trouble getting my head around the way jobs are launched by SLURM from an sbatch script. It seems like SLURM is ignoring the --ntasks argument and launching all the srun tasks in my batch file immediately. Here is an example, using a slight modification of the code from this answer on StackOverflow:

$ salloc --ntasks=1 --ntasks-per-core=1
salloc: Granted job allocation 1172
$ srun -n 1 sleep 10 & time srun -n 1 echo ok
[1] 5023
srun: cluster configuration lacks support for cpu binding
srun: cluster configuration lacks support for cpu binding
ok

real    0m0.052s
user    0m0.004s
sys 0m0.012s

So on my setup the srun echo command is being run immediately, whereas I would expect it to run after the srun sleep 10 command finishes.

I am using SLURM 2.6.5 to schedule and submit jobs on my personal workstation with 8 cores, and I installed it myself—so it's entirely possible the configuration is borked. Here are some relevant parts from the slurm.conf file:

# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
# COMPUTE NODES
NodeName=Tom NodeAddr=localhost CPUs=7 RealMemory=28100 State=UNKNOWN
PartitionName=Tom Nodes=Tom Default=YES MaxTime=INFINITE State=UP

Here is the output from printenv | grep SLURM after running salloc --ntasks=1

SLURM_NODELIST=Tom
SLURM_NODE_ALIASES=(null)
SLURM_MEM_PER_CPU=4100
SLURM_NNODES=1
SLURM_JOBID=1185
SLURM_NTASKS=1
SLURM_TASKS_PER_NODE=1
SLURM_JOB_ID=1185
SLURM_SUBMIT_DIR=/home/tom/
SLURM_NPROCS=1
SLURM_JOB_NODELIST=Tom
SLURM_JOB_CPUS_PER_NODE=1
SLURM_SUBMIT_HOST=Tom
SLURM_JOB_NUM_NODES=1

I'd appreciate any comments or suggestions. Please let me know if any more info is required.

Thanks for reading,

Tom

Update after playing around some more

I have made some progress but I'm still not quite getting the behaviour I want.

If I use --exclusive I can get the echo step to wait for the sleep step:

salloc --ntasks=1
salloc: Granted job allocation 2387
srun -n 1 --exclusive sleep 10 & time srun -n 1 --exclusive echo ok
[1] 16602
ok
[1]+  Done                    srun -n 1 --exclusive sleep 10

real    0m10.094s
user    0m0.017s
sys 0m0.037s

and

salloc --ntasks=2
salloc: Granted job allocation 2388
srun -n 1 --exclusive sleep 10 & time srun -n 1 --exclusive echo ok
[1] 16683
ok

real    0m0.067s
user    0m0.005s
sys 0m0.020s

But I still don't know how to do this properly if I'm running a multi-step job where each step needs several processors, e.g.

salloc --ntasks=6
salloc: Granted job allocation 2389
srun -n 2 --exclusive stress -c 2 &
srun -n 2 --exclusive stress -c 2 &
srun -n 2 --exclusive stress -c 2 &

will give me 12 stress processes, as will

salloc --ntasks=6
salloc: Granted job allocation 2390
srun -n 1 --exclusive stress -c 2 &
srun -n 1 --exclusive stress -c 2 &
srun -n 1 --exclusive stress -c 2 &
srun -n 1 --exclusive stress -c 2 &
srun -n 1 --exclusive stress -c 2 &
srun -n 1 --exclusive stress -c 2 &

So what should I do if I want my sbatch script to take 6 processors and start three steps at a time, each with 2 processors? Is it correct to use srun --exclusive -n 1 -c 2 stress -c 2?


Solution

  • I think the missing pieces were the --exclusive and --cpus-per-task arguments. I get the behaviour I'm looking for with

    salloc --ntasks=6
    salloc: Granted job allocation 2457
    srun --exclusive --ntasks=1 --cpus-per-task=2 stress -c 2 &
    srun --exclusive --ntasks=1 --cpus-per-task=2 stress -c 2 &
    srun --exclusive --ntasks=1 --cpus-per-task=2 stress -c 2 &
    srun --exclusive --ntasks=1 --cpus-per-task=2 stress -c 2 &
    

    This will launch 6 stress processes; the 4th stress command waits its turn at the end.

    This may be obvious but it took a while to figure out!