parallel-processing mpi multicore supercomputers

MPI Send latency for different process localities

I am currently participating in a course for efficient programming of supercomputers and multicore processors. Our recent assignment is to measure the latency for the MPI_Send command (thus the time spent sending a zero byte message). Now this alone would not be that hard, but we have to perform our measurements for the following criterias:

communication of processes in the same processor,
same node but different processors,
and for processes on different nodes.

I am wondering: How do i determine this? For proccesses on different nodes i thought about hashing the name returned by MPI_Get_processor_name, which returns the identifier of the node the process is currently running on, and sending it as a tag. I also tried using sched_cpu() to get the core id, but it seems like that this returns a incremental number, even if the cores a hyperthreaded (thus a process would run on the same core). How do i go about this? I just need a concept for determining the localities! Not a complete code for the stated problem. Thank you!

Solution

In order to have both MPI processes placed on separate cores of the same socket, you should pass the following options to mpiexec:

-genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_PIN_ORDER=compact

In order to have both MPI processes on cores from different sockets, you should use:

-genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_PIN_ORDER=scatter

In order to have them on two separate machines, you should create a host file that provides only one slot per node or use:

-perhost 1 -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=core

You can check the actual pinning/binding on Linux by calling sched_getcpuaffinity() and examining the returned affinity mask. As an alternative, you could parse /proc/self/status and look for Cpus_allowed or Cpus_allowed_list. On Windows, GetProcessAffinityMask() returns the active affinity mask.

You could also ask Intel MPI to report the final pinning by setting I_MPI_DEBUG to 4, but it produces a lot of other output in addition to the pinning information. Look for lines that resemble the following:

[0] MPI startup(): 0       1234     node100  {0}
[0] MPI startup(): 1       1235     node100  {1}