I have 8 RTX GPUs. When run p2pBandwidthLatencyTest
, The latencies between GPU0 and GPU1, GPU2 and GPU3, GPU4 and GPU5, GPU6 and GPU7 is 40,000 times slower than other pairs:
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1 2 3 4 5 6 7
0 1.80 49354.72 1.70 1.70 1.74 1.74 1.74 1.72
1 49354.84 1.37 1.70 1.69 1.74 1.76 1.73 1.72
2 1.88 1.81 1.73 49355.00 1.79 1.76 1.76 1.75
3 1.88 1.79 49354.85 1.33 3.79 3.84 3.88 3.91
4 1.89 1.88 1.90 1.87 1.72 49354.96 3.49 3.56
5 2.30 1.93 1.88 1.89 49354.89 1.32 3.63 3.60
6 2.55 2.53 2.37 2.29 2.24 2.26 3.50 49354.77
7 2.30 2.27 2.29 1.87 1.82 1.83 49354.85 1.36
Compare it with when peer-to-peer is disabled:
P2P=Disabled Latency Matrix (us)
GPU 0 1 2 3 4 5 6 7
0 1.80 14.31 13.86 13.49 14.52 13.89 13.58 13.58
1 13.71 1.82 14.44 13.95 14.65 13.62 15.05 15.20
2 13.38 14.23 1.73 16.59 13.77 15.44 14.10 13.64
3 12.68 15.62 12.50 1.77 14.92 15.01 15.17 14.87
4 13.51 13.60 15.09 13.40 1.27 12.48 12.68 19.47
5 14.92 13.84 13.42 13.42 16.53 1.30 16.37 16.60
6 14.29 13.62 14.66 13.62 14.90 13.70 1.32 14.33
7 14.26 13.42 14.35 13.53 16.89 14.26 17.03 1.36
Is this normal?
It turns out the super slow peer-to-peer is abnormal.
After I disable IOMMU (Intel VT-d) in the BIOS, the problem is gone:
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1 2 3 4 5 6 7
0 1.34 1.22 1.68 1.69 1.71 1.70 1.75 1.73
1 1.20 1.38 1.70 1.67 1.71 1.75 1.75 1.72
2 1.69 1.67 1.29 1.20 1.73 1.75 1.75 1.75
3 1.69 1.66 1.17 1.29 1.74 1.75 1.72 1.73
4 1.72 1.76 1.74 1.70 1.32 1.13 1.66 1.70
5 1.74 1.73 1.75 1.74 1.18 1.28 1.67 1.69
6 1.75 1.74 1.74 1.72 1.67 1.68 1.31 1.19
7 1.76 1.75 1.73 1.73 1.67 1.69 1.18 1.32
It seems the problem is the same as or is very similar to discussions in:
A few possible solutions are mentioned in the discussions:
Disable IOMMU:
Disable ACS:
My system having the problem only had IOMMU enabled in the BIOS. ACS was not turned on as lspci -vvv | grep ACS
got back nothing.
==============================
Background on I/O MMU:
https://en.wikipedia.org/wiki/X86_virtualization#I/O_MMU_virtualization_(AMD-Vi_and_Intel_VT-d)
It's part of the x86 virtualization. It's the virtualization done by the chipset. Besides the name IOMMU, it's also called AMD-Vi or Intel VT-d. Not to be confused with AMD-V and Intel VT-x which are virtualization via the CPU.