I'm trying to profile an existing application with a quite complicated structure. For now I am using perf_event_open
and the needed ioctl
calls for enabling the events which are of my interest.
The manpage
stays that PERF_COUNT_HW_INSTRUCTIONS
should be used carefully - so which one should be preferred in case of a Skylake
processor? Maybe a specific Intel PMU
?
The perf_event_open manpage http://man7.org/linux/man-pages/man2/perf_event_open.2.html says about PERF_COUNT_HW_INSTRUCTIONS:
PERF_COUNT_HW_INSTRUCTIONS
Retired instructions. Be careful, these can be affected by various issues, most notably hardware interrupt counts.
I think this means that COUNT_HW_INSTRUCTIONS can be used (and it is supported almost everywhere). But exact values of COUNT_HW_INSTRUCTIONS for some code fragment may be slightly different in several runs due to noise from interrupts or another logic.
So it is safe to use events PERF_COUNT_HW_INSTRUCTIONS and PERF_COUNT_HW_CPU_CYCLES on most CPU. perf_events subsystem in Linux kernel will map COUNT_HW_CPU_CYCLES to some raw events more suitable to currently used CPU and its PMU.
Depending on your goals you should try to get some statistics on PERF_COUNT_HW_INSTRUCTIONS values for your code fragment. You can also check stability of this counter with several runs of perf stat
with some simple program:
perf stat -e cycles:u,instructions:u /bin/echo 123
perf stat -e cycles:u,instructions:u /bin/echo 123
perf stat -e cycles:u,instructions:u /bin/echo 123
Or use integrated repeat function of perf stat:
perf stat --repeat 10 -e cycles:u,instructions:u /bin/echo 123
I have +-10 instructions events variation (less than 0.1%) for 200 thousands total instructions executed, so it is very stable. For cycles I have 5% variation, so it should be cycles event marked with careful warning.