Search code examples
mpihpcslurm

Hostfile with Mpirun on multinode with slurm


I have two executables I would like to run in the following way: For each node I want to launch N-1 processes to exe1 and 1 exe2

On previous slurm system that worked by doing such:

#!/bin/bash -l
#SBATCH --job-name=XXX
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --mem=120GB
#SBATCH --time=04:00:00



module purge
module load intel/compiler/2020.1.217
module load openmpi/intel/4.0.5_2020.1.217


scontrol show hostname $SLURM_JOB_NODELIST | perl -ne 'chomb; print "$_"x1'> myhostall
scontrol show hostname $SLURM_JOB_NODELIST | perl -ne 'chomb; print "$_"x1'>>myhostall

mpirun --mca btl_openib_allow_ib 1 --report-bindings -hostfile myhostall -np 2 ./exe1 : -np 2 ./exe2

In this example, I have two nodes with each two tasks/node. So, exe1 should have 1 rank from each node and similarly for exe2.

If I say cat myhostall:

come-0-12
come-0-13
come-0-12
come-0-13

But in my code when priting the processor name using MPI_GET_PROCESSOR_NAME it turns out that exe1 both ranks print come-0-12 and for exe2 both prints come-0-13.

So the question is here:

How do I specify N number of tasks per each node to exe1 and M number of tasks per each node to exe2


Solution

  • You can specify 2 hostfiles, one per exe

    e.g.

    mpirun -np 2 --hostfile hostfile_1 exe1 : -np 2 --hostfile hostfile_2 exe2

    In each hostfile you can specify how many slots each exe will use on each node.

    for example (for see more at https://www.open-mpi.org/faq/?category=running#mpirun-hostfile), in case if you want both exe1 & exe2 to have 1 CPU from each node, the hostfile_1 and hostfile_2 can be identical or perhaps even the same file:

    node1 slots=1
    node2 slots=1
    

    However, if hostsfile_1 and hostfile_2 contain the same nodes mpirun will likely redistribute tasks as it "thinks" more optimal.

    Another approach is to specify the same hosts file and use "--map-by node" directive (the default behaviour is "--map-by slot"), e.g.:

    mpirun -hostfile hosts.txt -np 2 --map-by node ./exe1 : -hostfile hosts.txt -np 2 --map-by node ./exe2
    

    where hosts.txt contains:

    node1 slots=2
    node2 slots=2
    

    which gives in my case (OpenMPI-4.0.4)

    EXE1 from processor node1, rank 0 out of 4 processors
    EXE1 from processor node2, rank 1 out of 4 processors
    EXE2 from processor node1, rank 2 out of 4 processors
    EXE2 from processor node2, rank 3 out of 4 processors
    

    Can also potentially use rankfiles (if you use OpenMPI) to tie tasks to particular CPU more explicitly, but it can be a bit cumbersome...