Search code examples
mpislurm

SLURM - forcing MPI to schedule different ranks on different physical CPUs


I am running an experiment on an 8 node cluster under SLURM. Each CPU has 8 physical cores, and is capable of hyperthreading. When running a program with

#SBATCH --nodes=8
#SBATCH --ntasks-per-node=8

mpirun -n 64 bin/hello_world_mpi

it schedules two ranks on the same physical core. Adding the option

#SBATCH --ntasks-per-core=1

gives an error, SLURM saying "Batch job submission failed: Requested node configuration is not available". Is it somehow only allocating 4 physical cores per node? How can I fix this?


Solution

  • You can check the available CPU information in your cluster using sinfo -o%C.

    I wasn't able to find any --ntasks-per-cpu for SBATCH in the documentation. You could try the following options for SBATCH --ntasks-per-core. As per documentation:

    --ntasks-per-core= Request the maximum ntasks be invoked on each core. Meant to be used with the --ntasks option. Related to --ntasks-per-node except at the core level instead of the node level. This option will be inherited by srun.

    You could also try --cpus-per-task.

    c, --cpus-per-task= Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.

    Also please note:

    Beginning with 22.05, srun will not inherit the --cpus-per-task value requested by salloc or sbatch. It must be requested again with the call to srun or set with the SRUN_CPUS_PER_TASK environment variable if desired for the task(s).