I have a stand with i7-5960X CPU with 8 cores + HT (16 threads). And here is a program that try to use both OpenMP and OpenMPI. It called as following:
# mpirun -np <NN1> -x OMP_NUM_THREADS=<NN2> <my_prog>
where NN1 and NN2 were varied. In the code I have this:
#pragma omp parallel
nOMP=omp_get_num_threads();
int maxOMP=omp_get_max_threads();
int procOMP=omp_get_num_procs();
printf("OMP version running on %d threads. Max threads=%d, available procs=%d\n", nOMP, maxOMP, procOMP);
Here is a result:
#1, NN1=2, NN2=2:
OMP version running on 2 threads. Max threads=2, available procs=2
#2, NN1=2, NN2=4:
OMP version running on 4 threads. Max threads=4, available procs=2
#3, NN1=3, NN2=4:
OMP version running on 4 threads. Max threads=4, available procs=16
So for 1 and 2 mpi threads omp_get_num_procs() returns "2" always and for 3 and above it returns "16".
The questions are: why and how to force it to return correct value for 1 and 2 mpi threads?
OpenMpi version is 1.10.3 and GCC version is 4.8.5, Centos 7 x86_64. Direct call without mpirun is correct too.
UPD1 Here is a relevant discussion: http://forum.abinit.org/viewtopic.php?f=2&t=2782, but no answer anyway.
UPD2 openmpi-2.x.x fix this.
The default binding is
- NUMA domain
(e.g. socket) if > 2
- core
otherwise
so if you mpirun -np 2 ...
, there will be only one core per MPI task
but if you mpirun -np 3 ...
, each MPI task is bound to a socket.
For example, on my VM (1 socket and 4 cores)
$ mpirun -np 2 grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list: 0
Cpus_allowed_list: 1
$ mpirun -np 3 grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list: 0-3
Cpus_allowed_list: 0-3
Cpus_allowed_list: 0-3
you can mpirun -bind-to socket
or even mpirun -bind-to none
in order to alter the default binding option.