How to properly understand the IMB-benchmark result

Hello currently I am using Infiniband and testing the performance with IMB-benchmark, I'am currently testing the parallel transfer test and was wondering the results indeed reflect the parallel performance of the 8 processes.

The explanation of the results is too vague for me to understand. Since ( 6 additional processes waiting in MPI_Barrier) is mentioned in every result, I suspect that it only runs 2 process each?

The throughput column t_avg[usec] result seems to get the proper result, but I need to make it sure that I am understanding correctly.

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------

Is this passage above mean that I am running 8 processes parallel?

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------

and this passage means that 4 processes are running on parallel? Help from someone who is familiar with the IMB-benchmark is greatly appreciated thanks

Here is the full result below

# np - 8
#------------------------------------------------------------
#    Intel (R) MPI Benchmarks 2018, MPI-1 part
#------------------------------------------------------------
# Date                  : Mon Oct 16 14:14:20 2017
# Machine               : x86_64
# System                : Linux
# Release               : 4.4.0-96-generic
# Version               : #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
# MPI Version           : 3.0
# MPI Thread Environment:


# Calling sequence was:

# ./IMB-MPI1 Sendrecv Exchange

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Sendrecv
# Exchange

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        13.85        13.85        13.85         0.00
            1         1000        12.22        12.22        12.22         0.16
            2         1000        10.08        10.08        10.08         0.40
            4         1000         9.43         9.43         9.43         0.85
            8         1000         8.89         8.91         8.90         1.80
           16         1000         8.70         8.71         8.71         3.67
           32         1000         9.00         9.00         9.00         7.11
           64         1000         8.82         8.82         8.82        14.51
          128         1000         8.90         8.90         8.90        28.77
          256         1000         8.98         8.98         8.98        56.99
          512         1000         9.78         9.78         9.78       104.75
         1024         1000        12.65        12.65        12.65       161.91
         2048         1000        18.31        18.32        18.31       223.63
         4096         1000        20.05        20.05        20.05       408.52
         8192         1000        21.15        21.16        21.16       774.11
        16384         1000        27.46        27.47        27.46      1193.05
        32768         1000        36.93        36.94        36.93      1774.31
        65536          640        60.56        60.59        60.57      2163.39
       131072          320       117.62       117.63       117.63      2228.57
       262144          160       202.67       202.68       202.67      2586.78
       524288           80       323.86       324.28       324.07      3233.56
      1048576           40       615.05       615.47       615.26      3407.42
      2097152           20      1214.74      1216.89      1215.82      3446.74
      4194304           10      2471.83      2488.45      2480.14      3371.02

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        11.14        11.15        11.15         0.00
            1         1000        11.16        11.16        11.16         0.18
            2         1000        11.11        11.12        11.12         0.36
            4         1000        11.10        11.11        11.10         0.72
            8         1000        11.03        11.04        11.03         1.45
           16         1000        11.21        11.22        11.22         2.85
           32         1000        11.81        11.81        11.81         5.42
           64         1000        11.58        11.58        11.58        11.05
          128         1000        11.77        11.78        11.78        21.72
          256         1000        11.88        11.89        11.89        43.05
          512         1000        13.03        13.03        13.03        78.57
         1024         1000        14.73        14.74        14.74       138.92
         2048         1000        19.37        19.39        19.38       211.24
         4096         1000        21.31        21.34        21.33       383.96
         8192         1000        26.19        26.22        26.20       624.84
        16384         1000        32.65        32.69        32.67      1002.26
        32768         1000        48.71        48.78        48.75      1343.52
        65536          640        75.14        75.22        75.18      1742.63
       131072          320       174.66       175.15       174.94      1496.65
       262144          160       301.22       302.02       301.44      1735.95
       524288           80       539.40       542.68       540.78      1932.21
      1048576           40      1015.45      1026.34      1020.59      2043.32
      2097152           20      1959.53      1985.57      1971.34      2112.39
      4194304           10      3549.00      3641.61      3590.76      2303.55

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        12.81        12.83        12.82         0.00
            1         1000        12.82        12.84        12.83         0.16
            2         1000        12.73        12.75        12.74         0.31
            4         1000        12.82        12.85        12.84         0.62
            8         1000        12.87        12.88        12.87         1.24
           16         1000        12.83        12.86        12.84         2.49
           32         1000        13.25        13.28        13.26         4.82
           64         1000        13.44        13.46        13.45         9.51
          128         1000        13.49        13.51        13.50        18.94
          256         1000        13.72        13.74        13.73        37.27
          512         1000        13.69        13.71        13.70        74.72
         1024         1000        15.73        15.75        15.74       130.07
         2048         1000        20.72        20.76        20.74       197.28
         4096         1000        22.68        22.74        22.72       360.28
         8192         1000        29.48        29.52        29.50       555.04
        16384         1000        39.89        39.95        39.92       820.31
        32768         1000        57.38        57.48        57.43      1140.24
        65536          640        95.23        95.34        95.29      1374.78
       131072          320       214.61       215.16       214.83      1218.38
       262144          160       365.75       368.39       367.28      1423.18
       524288           80       679.82       687.10       683.13      1526.08
      1048576           40      1277.18      1309.22      1295.65      1601.83
      2097152           20      2292.99      2377.56      2339.35      1764.12
      4194304           10      4617.95      4919.67      4778.37      1705.12

#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        12.41        12.42        12.42         0.00
            1         1000        12.47        12.48        12.47         0.32
            2         1000        11.93        11.94        11.94         0.67
            4         1000        11.95        11.96        11.95         1.34
            8         1000        11.91        11.92        11.92         2.69
           16         1000        11.97        11.98        11.97         5.34
           32         1000        12.80        12.81        12.80        10.00
           64         1000        12.84        12.84        12.84        19.93
          128         1000        12.90        12.91        12.91        39.67
          256         1000        12.90        12.91        12.91        79.34
          512         1000        14.04        14.04        14.04       145.82
         1024         1000        17.13        17.14        17.13       239.02
         2048         1000        21.06        21.06        21.06       389.05
         4096         1000        23.32        23.33        23.32       702.41
         8192         1000        28.07        28.07        28.07      1167.45
        16384         1000        37.81        37.82        37.82      1732.64
        32768         1000        55.23        55.24        55.24      2372.75
        65536          640       101.04       101.06       101.05      2593.84
       131072          320       212.88       212.88       212.88      2462.84
       262144          160       362.37       362.38       362.37      2893.62
       524288           80       668.88       668.89       668.88      3135.26
      1048576           40      1286.48      1287.81      1287.15      3256.92
      2097152           20      2463.56      2464.13      2463.84      3404.29
      4194304           10      4845.24      4854.75      4849.99      3455.83

#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        16.46        16.46        16.46         0.00
            1         1000        16.42        16.43        16.42         0.24
            2         1000        16.17        16.17        16.17         0.49
            4         1000        16.17        16.17        16.17         0.99
            8         1000        16.19        16.20        16.20         1.98
           16         1000        16.21        16.22        16.22         3.94
           32         1000        17.20        17.21        17.20         7.44
           64         1000        17.09        17.10        17.10        14.97
          128         1000        17.24        17.25        17.25        29.68
          256         1000        17.40        17.41        17.40        58.83
          512         1000        17.59        17.61        17.60       116.32
         1024         1000        21.43        21.45        21.44       190.95
         2048         1000        29.49        29.50        29.49       277.71
         4096         1000        31.63        31.66        31.64       517.58
         8192         1000        36.70        36.72        36.71       892.41
        16384         1000        49.50        49.53        49.52      1323.07
        32768         1000        68.35        68.36        68.36      1917.38
        65536          640       108.80       108.85       108.82      2408.31
       131072          320       314.38       314.72       314.56      1665.91
       262144          160       521.71       522.24       521.94      2007.84
       524288           80       930.03       933.47       931.82      2246.62
      1048576           40      1729.81      1738.30      1734.66      2412.87
      2097152           20      3384.33      3414.99      3403.61      2456.41
      4194304           10      6972.50      7058.12      7028.16      2377.01

#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 8
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000        18.91        18.93        18.92         0.00
            1         1000        19.06        19.08        19.07         0.21
            2         1000        18.91        18.92        18.92         0.42
            4         1000        19.07        19.09        19.08         0.84
            8         1000        18.81        18.83        18.82         1.70
           16         1000        19.02        19.03        19.03         3.36
           32         1000        19.85        19.85        19.85         6.45
           64         1000        19.76        19.78        19.77        12.94
          128         1000        19.94        19.96        19.95        25.65
          256         1000        20.16        20.18        20.17        50.75
          512         1000        20.50        20.51        20.50        99.86
         1024         1000        24.52        24.55        24.54       166.83
         2048         1000        36.35        36.39        36.37       225.14
         4096         1000        38.77        38.81        38.79       422.20
         8192         1000        44.79        44.82        44.81       731.12
        16384         1000        59.28        59.33        59.31      1104.68
        32768         1000        86.39        86.47        86.42      1515.87
        65536          640       142.47       142.60       142.53      1838.29
       131072          320       402.11       402.98       402.57      1301.04
       262144          160       648.90       650.30       649.68      1612.44
       524288           80      1209.17      1213.71      1211.74      1727.89
      1048576           40      2332.69      2355.17      2344.35      1780.89
      2097152           20      4686.88      4767.48      4733.77      1759.55
      4194304           10      9457.18      9674.69      9567.31      1734.13


# All processes entering MPI_Finalize

Solution

The IMB benchmark test all at once

various MPI subroutines (MPI_Sendrecv and MPI_Exchange here)
various message sizes (from 0 to 4MB here)
various communicator sizes (2, 4 and 8 here)

Since mpirun is invoked once with -np 8, it means there 8 MPI tasks are created. So when testing a size 2 communicator, an extra size 6 communicator is created under the hood, and its 6 MPI tasks are simply hanging in MPI_Barrier, hence the message

# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)