I am running a C++ benchmark test for a specific application. In this test, I open the performance counter file (__NR_perf_event_open syscall) before the critical section, proceed with the section and then after read the specified metric (instructions, cycles, branches, cachemisses, etc).
I verified that this needs to run under sudo because the process needs CAP_PERFCOUNT capabilities. I also have to verify that /proc/sys/kernel/perf_event_paranoid
is set to a number higher than 2, which seems to be always the case with Ubuntu 20.04.3 with kernel 5.11.0 which is the OS I standardized across tests.
This setup works on all my local machines. On the cloud, however, it works only on some instances as m5zn.6xlarge (Intel Xeon Platinum 8252C). It does not work on others as t3.medium, c3.4xlarge, c5a.8xlarge.
The AMI on all them are the same ami-09e67e426f25ce0d7.
One easy way to verify this behavior is run the following command:
sudo perf stat /bin/sleep 1
On the m5zn box I will see:
Performance counter stats for '/bin/sleep 1':
0.54 msec task-clock # 0.001 CPUs utiliz
1 context-switches # 0.002 M/sec
1 cpu-migrations # 0.002 M/sec
75 page-faults # 0.139 M/sec
2191485 cycles # 4.070 GHz
1292564 instructions # 0.59 insn per cyc
258373 branches # 479.860 M/sec
11090 branch-misses # 4.29% of all branc
1.000902741 seconds time elapsed
0.000889000 seconds user
0.000000000 seconds sys
While on the other boxes I will see:
Performance counter stats for '/bin/sleep 1':
0.62 msec task-clock # 0.001 CPUs utilized
2 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
76 page-faults # 0.124 M/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
1.002488031 seconds time elapsed
0.000930000 seconds user
0.000000000 seconds sys
Perf with not supported values
My suspicion is that the m5zn.6xlarge is backed by a real instance while the others are shared instances. is my suspicion correct?
What instances I can launch that will provide me with performance counter PMU support?
Thank you!
After some research I found out that because all Amazon AWS instances are virtual instances, none of the guest operating systems can directly access the hardware performance counters (PMC or PMU).
The guest OS can only read the performance counters through a kernel driver called virtual PMU (vPMU), which is available only for certain Intel Xeon CPUs.
Therefore in my attempted list of instances, only the m5zn with an Intel Platinum 8252 has a supported CPU.
It is easy to check if the guest OS supports vPMU by running
cat /proc/cpuinfo | grep arch_perfmon
It is also possible to check in the dmesg output right after smpboot:
[ 0.916264] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x4)
[ 0.916410] Performance Events: unsupported p6 CPU model 85 no PMU driver, software events only.
On AWS the rule of thumb is that you will get vPMU only on the largest instances, or instances that take an entire socket.
https://oavdeev.github.io/posts/vpmu_support_z1d/
Currently these instances support vPMU:
i3.metal
c5.9xlarge
c5.18xlarge
m4.16xlarge
m5.12xlarge
m5.24xlarge
r5.12xlarge
r5.24xlarge
f1.16xlarge
h1.16xlarge
i3.16xlarge
p2.16xlarge
p3.16xlarge
r4.16xlarge
x1.32xlarge
c5d.9xlarge
c5d.18xlarge
m5d.12xlarge
m5d.24xlarge
r5d.12xlarge
r5d.24xlarge
x1e.32xlarge