Hello currently I am using Infiniband and testing the performance with IMB-benchmark, I'am currently testing the parallel transfer test and was wondering the results indeed reflect the parallel performance of the 8 processes.
The explanation of the results is too vague for me to understand. Since ( 6 additional processes waiting in MPI_Barrier) is mentioned in every result, I suspect that it only runs 2 process each?
The throughput column t_avg[usec] result seems to get the proper result, but I need to make it sure that I am understanding correctly.
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
Is this passage above mean that I am running 8 processes parallel?
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
and this passage means that 4 processes are running on parallel? Help from someone who is familiar with the IMB-benchmark is greatly appreciated thanks
Here is the full result below
# np - 8
#------------------------------------------------------------
# Intel (R) MPI Benchmarks 2018, MPI-1 part
#------------------------------------------------------------
# Date : Mon Oct 16 14:14:20 2017
# Machine : x86_64
# System : Linux
# Release : 4.4.0-96-generic
# Version : #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
# MPI Version : 3.0
# MPI Thread Environment:
# Calling sequence was:
# ./IMB-MPI1 Sendrecv Exchange
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# Sendrecv
# Exchange
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 13.85 13.85 13.85 0.00
1 1000 12.22 12.22 12.22 0.16
2 1000 10.08 10.08 10.08 0.40
4 1000 9.43 9.43 9.43 0.85
8 1000 8.89 8.91 8.90 1.80
16 1000 8.70 8.71 8.71 3.67
32 1000 9.00 9.00 9.00 7.11
64 1000 8.82 8.82 8.82 14.51
128 1000 8.90 8.90 8.90 28.77
256 1000 8.98 8.98 8.98 56.99
512 1000 9.78 9.78 9.78 104.75
1024 1000 12.65 12.65 12.65 161.91
2048 1000 18.31 18.32 18.31 223.63
4096 1000 20.05 20.05 20.05 408.52
8192 1000 21.15 21.16 21.16 774.11
16384 1000 27.46 27.47 27.46 1193.05
32768 1000 36.93 36.94 36.93 1774.31
65536 640 60.56 60.59 60.57 2163.39
131072 320 117.62 117.63 117.63 2228.57
262144 160 202.67 202.68 202.67 2586.78
524288 80 323.86 324.28 324.07 3233.56
1048576 40 615.05 615.47 615.26 3407.42
2097152 20 1214.74 1216.89 1215.82 3446.74
4194304 10 2471.83 2488.45 2480.14 3371.02
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 11.14 11.15 11.15 0.00
1 1000 11.16 11.16 11.16 0.18
2 1000 11.11 11.12 11.12 0.36
4 1000 11.10 11.11 11.10 0.72
8 1000 11.03 11.04 11.03 1.45
16 1000 11.21 11.22 11.22 2.85
32 1000 11.81 11.81 11.81 5.42
64 1000 11.58 11.58 11.58 11.05
128 1000 11.77 11.78 11.78 21.72
256 1000 11.88 11.89 11.89 43.05
512 1000 13.03 13.03 13.03 78.57
1024 1000 14.73 14.74 14.74 138.92
2048 1000 19.37 19.39 19.38 211.24
4096 1000 21.31 21.34 21.33 383.96
8192 1000 26.19 26.22 26.20 624.84
16384 1000 32.65 32.69 32.67 1002.26
32768 1000 48.71 48.78 48.75 1343.52
65536 640 75.14 75.22 75.18 1742.63
131072 320 174.66 175.15 174.94 1496.65
262144 160 301.22 302.02 301.44 1735.95
524288 80 539.40 542.68 540.78 1932.21
1048576 40 1015.45 1026.34 1020.59 2043.32
2097152 20 1959.53 1985.57 1971.34 2112.39
4194304 10 3549.00 3641.61 3590.76 2303.55
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 12.81 12.83 12.82 0.00
1 1000 12.82 12.84 12.83 0.16
2 1000 12.73 12.75 12.74 0.31
4 1000 12.82 12.85 12.84 0.62
8 1000 12.87 12.88 12.87 1.24
16 1000 12.83 12.86 12.84 2.49
32 1000 13.25 13.28 13.26 4.82
64 1000 13.44 13.46 13.45 9.51
128 1000 13.49 13.51 13.50 18.94
256 1000 13.72 13.74 13.73 37.27
512 1000 13.69 13.71 13.70 74.72
1024 1000 15.73 15.75 15.74 130.07
2048 1000 20.72 20.76 20.74 197.28
4096 1000 22.68 22.74 22.72 360.28
8192 1000 29.48 29.52 29.50 555.04
16384 1000 39.89 39.95 39.92 820.31
32768 1000 57.38 57.48 57.43 1140.24
65536 640 95.23 95.34 95.29 1374.78
131072 320 214.61 215.16 214.83 1218.38
262144 160 365.75 368.39 367.28 1423.18
524288 80 679.82 687.10 683.13 1526.08
1048576 40 1277.18 1309.22 1295.65 1601.83
2097152 20 2292.99 2377.56 2339.35 1764.12
4194304 10 4617.95 4919.67 4778.37 1705.12
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 12.41 12.42 12.42 0.00
1 1000 12.47 12.48 12.47 0.32
2 1000 11.93 11.94 11.94 0.67
4 1000 11.95 11.96 11.95 1.34
8 1000 11.91 11.92 11.92 2.69
16 1000 11.97 11.98 11.97 5.34
32 1000 12.80 12.81 12.80 10.00
64 1000 12.84 12.84 12.84 19.93
128 1000 12.90 12.91 12.91 39.67
256 1000 12.90 12.91 12.91 79.34
512 1000 14.04 14.04 14.04 145.82
1024 1000 17.13 17.14 17.13 239.02
2048 1000 21.06 21.06 21.06 389.05
4096 1000 23.32 23.33 23.32 702.41
8192 1000 28.07 28.07 28.07 1167.45
16384 1000 37.81 37.82 37.82 1732.64
32768 1000 55.23 55.24 55.24 2372.75
65536 640 101.04 101.06 101.05 2593.84
131072 320 212.88 212.88 212.88 2462.84
262144 160 362.37 362.38 362.37 2893.62
524288 80 668.88 668.89 668.88 3135.26
1048576 40 1286.48 1287.81 1287.15 3256.92
2097152 20 2463.56 2464.13 2463.84 3404.29
4194304 10 4845.24 4854.75 4849.99 3455.83
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 16.46 16.46 16.46 0.00
1 1000 16.42 16.43 16.42 0.24
2 1000 16.17 16.17 16.17 0.49
4 1000 16.17 16.17 16.17 0.99
8 1000 16.19 16.20 16.20 1.98
16 1000 16.21 16.22 16.22 3.94
32 1000 17.20 17.21 17.20 7.44
64 1000 17.09 17.10 17.10 14.97
128 1000 17.24 17.25 17.25 29.68
256 1000 17.40 17.41 17.40 58.83
512 1000 17.59 17.61 17.60 116.32
1024 1000 21.43 21.45 21.44 190.95
2048 1000 29.49 29.50 29.49 277.71
4096 1000 31.63 31.66 31.64 517.58
8192 1000 36.70 36.72 36.71 892.41
16384 1000 49.50 49.53 49.52 1323.07
32768 1000 68.35 68.36 68.36 1917.38
65536 640 108.80 108.85 108.82 2408.31
131072 320 314.38 314.72 314.56 1665.91
262144 160 521.71 522.24 521.94 2007.84
524288 80 930.03 933.47 931.82 2246.62
1048576 40 1729.81 1738.30 1734.66 2412.87
2097152 20 3384.33 3414.99 3403.61 2456.41
4194304 10 6972.50 7058.12 7028.16 2377.01
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 8
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 18.91 18.93 18.92 0.00
1 1000 19.06 19.08 19.07 0.21
2 1000 18.91 18.92 18.92 0.42
4 1000 19.07 19.09 19.08 0.84
8 1000 18.81 18.83 18.82 1.70
16 1000 19.02 19.03 19.03 3.36
32 1000 19.85 19.85 19.85 6.45
64 1000 19.76 19.78 19.77 12.94
128 1000 19.94 19.96 19.95 25.65
256 1000 20.16 20.18 20.17 50.75
512 1000 20.50 20.51 20.50 99.86
1024 1000 24.52 24.55 24.54 166.83
2048 1000 36.35 36.39 36.37 225.14
4096 1000 38.77 38.81 38.79 422.20
8192 1000 44.79 44.82 44.81 731.12
16384 1000 59.28 59.33 59.31 1104.68
32768 1000 86.39 86.47 86.42 1515.87
65536 640 142.47 142.60 142.53 1838.29
131072 320 402.11 402.98 402.57 1301.04
262144 160 648.90 650.30 649.68 1612.44
524288 80 1209.17 1213.71 1211.74 1727.89
1048576 40 2332.69 2355.17 2344.35 1780.89
2097152 20 4686.88 4767.48 4733.77 1759.55
4194304 10 9457.18 9674.69 9567.31 1734.13
# All processes entering MPI_Finalize
The IMB
benchmark test all at once
MPI_Sendrecv
and MPI_Exchange
here)0
to 4MB
here)2
, 4
and 8
here)Since mpirun
is invoked once with -np 8
, it means there 8
MPI tasks are created.
So when testing a size 2
communicator, an extra size 6
communicator is created under the hood, and its 6
MPI tasks are simply hanging in MPI_Barrier
, hence the message
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)