Search code examples
parallel-processingmpimulticoresupercomputers

MPI Send latency for different process localities


I am currently participating in a course for efficient programming of supercomputers and multicore processors. Our recent assignment is to measure the latency for the MPI_Send command (thus the time spent sending a zero byte message). Now this alone would not be that hard, but we have to perform our measurements for the following criterias:

  • communication of processes in the same processor,
  • same node but different processors,
  • and for processes on different nodes.

I am wondering: How do i determine this? For proccesses on different nodes i thought about hashing the name returned by MPI_Get_processor_name, which returns the identifier of the node the process is currently running on, and sending it as a tag. I also tried using sched_cpu() to get the core id, but it seems like that this returns a incremental number, even if the cores a hyperthreaded (thus a process would run on the same core). How do i go about this? I just need a concept for determining the localities! Not a complete code for the stated problem. Thank you!


Solution

  • In order to have both MPI processes placed on separate cores of the same socket, you should pass the following options to mpiexec:

    -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_PIN_ORDER=compact
    

    In order to have both MPI processes on cores from different sockets, you should use:

    -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=core -genv I_MPI_PIN_ORDER=scatter
    

    In order to have them on two separate machines, you should create a host file that provides only one slot per node or use:

    -perhost 1 -genv I_MPI_PIN=1 -genv I_MPI_PIN_DOMAIN=core
    

    You can check the actual pinning/binding on Linux by calling sched_getcpuaffinity() and examining the returned affinity mask. As an alternative, you could parse /proc/self/status and look for Cpus_allowed or Cpus_allowed_list. On Windows, GetProcessAffinityMask() returns the active affinity mask.

    You could also ask Intel MPI to report the final pinning by setting I_MPI_DEBUG to 4, but it produces a lot of other output in addition to the pinning information. Look for lines that resemble the following:

    [0] MPI startup(): 0       1234     node100  {0}
    [0] MPI startup(): 1       1235     node100  {1}