I am running a code on a 24c architecture and would like to use one mpi rank for each set of three cores bound to a L3 cache bloc. So, 8 mpi ranks per socket, 16 per node, with 3 threads per rank. I think the following command line should apply
mpirun --bind-to l3 -np 16 gmx_mpi mdrun -nt 3
--bind-to
binding the mpi ranks to each bloc of L3 cache, -np
allocating 16 mpi ranks per node and a -nt
a number of threads per MPI rank of 3. Is this the correct approach ?
If the core is capable of multithreading (2 threads) is it right to write
mpirun --bind-to l3 -np 16 gmx_mpi mdrun -nt 6
--bind-to core
is I assume binding one MPI rank per core, with no spanning into threads, or spanning into 2 threads per core for exploiting MT, e.g.
mpirun --bind-to core -np 48 gmx_mpi mdrun -nt 2
with 48 ranks one per core on a 2-socket platform and 2 threads per core (MT)
Would you confirm ?
the exact command seems to be --bind-to l3cache
mpirun --bind-to l3cache -np 16 gmx_mpi mdrun -nt 6