I have a Python file bla.py:
import os
from mpi4py import MPI
import psutil
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
PID = os.getpid()
cpu_affinity = psutil.Process().cpu_num()
print(f'rank: {rank} has PID: {PID} with affinity {cpu_affinity}')
But when I execute it using mpirun --cpu-set 15-20 --bind-to core -n 6 python3 bla.py
, I get:
rank: 5 has PID: 2451954 with affinity 16
rank: 2 has PID: 2451923 with affinity 15
rank: 0 has PID: 2451911 with affinity 20
rank: 4 has PID: 2451944 with affinity 18
rank: 3 has PID: 2451935 with affinity 16
rank: 1 has PID: 2451919 with affinity 17
Note how there are 2 processes with affinity 16 and no process with affinity 19. This is non-deterministic and sometimes the processes are actually mapped 1:1 to specific cores, and sometimes not.
With --display-map
:
Data for JOB [62540,1] offset 0 Total slots allocated 24
======================== JOB MAP ========================
Data for node: triton Num slots: 24 Max slots: 0 Num procs: 6
Process OMPI jobid: [62540,1] App: 0 Process rank: 0 Bound: UNBOUND
Process OMPI jobid: [62540,1] App: 0 Process rank: 1 Bound: UNBOUND
Process OMPI jobid: [62540,1] App: 0 Process rank: 2 Bound: UNBOUND
Process OMPI jobid: [62540,1] App: 0 Process rank: 3 Bound: UNBOUND
Process OMPI jobid: [62540,1] App: 0 Process rank: 4 Bound: UNBOUND
Process OMPI jobid: [62540,1] App: 0 Process rank: 5 Bound: UNBOUND
=============================================================
With --report-bindings
:
[triton.ecn.purdue.edu:2486163] MCW rank 0 is not bound (or bound to all available processors)
[triton.ecn.purdue.edu:2486163] MCW rank 1 is not bound (or bound to all available processors)
[triton.ecn.purdue.edu:2486163] MCW rank 2 is not bound (or bound to all available processors)
[triton.ecn.purdue.edu:2486163] MCW rank 3 is not bound (or bound to all available processors)
[triton.ecn.purdue.edu:2486163] MCW rank 4 is not bound (or bound to all available processors)
[triton.ecn.purdue.edu:2486163] MCW rank 5 is not bound (or bound to all available processors)
How do I pin my 6 launched processes to 6 different cores?
I have tried --map-by core
without passing --cpu-set
and that actually assigns each process to a particular core, but it does not let me select which CPUs I want to use (--map-by core
and --cpu-set` cannot be passed together).
You should not confuse affinity with the core that executes the process at a specific time. Affinity describes a set of cores to which the OS can schedule the execution of a process. The OS is free to move the process between these cores at any time.
The psutils function to read the affinity information is cpu_affinity()
. The function cpu_num()
returns the core that executes the process at the time of the call.
import os
from mpi4py import MPI
import psutil
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
PID = os.getpid()
cpu_affinity = psutil.Process().cpu_affinity()
cpu_num = psutil.Process().cpu_num()
print(f'rank: {rank} has PID: {PID} with affinity {cpu_affinity} on cpu {cpu_num}')
Executing with OpenMPI yields:
$ mpirun.openmpi -report-bindings -bind-to core -np 4 python3 ./bla.py
[ITC19206:2587072] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../..]
[ITC19206:2587072] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB/../..]
[ITC19206:2587072] MCW rank 2 bound to socket 0[core 2[hwt 0-1]]: [../../BB/..]
[ITC19206:2587072] MCW rank 3 bound to socket 0[core 3[hwt 0-1]]: [../../../BB]
rank: 3 has PID: 2587078 with affinity [3, 7] on cpu 3
rank: 1 has PID: 2587076 with affinity [1, 5] on cpu 5
rank: 0 has PID: 2587075 with affinity [0, 4] on cpu 0
rank: 2 has PID: 2587077 with affinity [2, 6] on cpu 2
The affinity printed by psutils is effectively the same as the mask reported by MPI - they just use a different numbering scheme for the hw threads.
The flags for mpirun depend on the MPI implementation. MPICH has different flags.
The documentation regarding the -cpu-set 1,2
flag might be confusing ("Comma-delimited list of processor IDs to which to bind processes"). It does not specify a set of cpus to be used by the bind/map options, but specifies the set of cpus that will be used to bind each of the processes.
Based on the output from bla.py
, the effect seems to be the same as -bind-to cpu-list -cpu-list 1,2
.
Adding the ordered
option distributes the processes to the cores:
$ mpirun.openmpi -report-bindings -cpu-list 1-2 -bind-to cpu-list:ordered -np 4 python3 ./bla.py
[ITC19206:2645252] MCW rank 3 bound to socket 0[core 2[hwt 1]]: [../../.B/..]
[ITC19206:2645252] MCW rank 0 bound to socket 0[core 1[hwt 0]]: [../B./../..]
[ITC19206:2645252] MCW rank 1 bound to socket 0[core 2[hwt 0]]: [../../B./..]
[ITC19206:2645252] MCW rank 2 bound to socket 0[core 1[hwt 1]]: [../.B/../..]
rank: 1 has PID: 2645256 with affinity [2] on cpu 2
rank: 0 has PID: 2645255 with affinity [1] on cpu 1
rank: 2 has PID: 2645257 with affinity [5] on cpu 5
rank: 3 has PID: 2645258 with affinity [6] on cpu 6