Search code examples
intelperf

perf_event_open and PERF_COUNT_HW_INSTRUCTIONS


I'm trying to profile an existing application with a quite complicated structure. For now I am using perf_event_open and the needed ioctl calls for enabling the events which are of my interest.

The manpage stays that PERF_COUNT_HW_INSTRUCTIONS should be used carefully - so which one should be preferred in case of a Skylake processor? Maybe a specific Intel PMU?


Solution

  • The perf_event_open manpage http://man7.org/linux/man-pages/man2/perf_event_open.2.html says about PERF_COUNT_HW_INSTRUCTIONS:

    PERF_COUNT_HW_INSTRUCTIONS Retired instructions. Be careful, these can be affected by various issues, most notably hardware interrupt counts.

    I think this means that COUNT_HW_INSTRUCTIONS can be used (and it is supported almost everywhere). But exact values of COUNT_HW_INSTRUCTIONS for some code fragment may be slightly different in several runs due to noise from interrupts or another logic.

    So it is safe to use events PERF_COUNT_HW_INSTRUCTIONS and PERF_COUNT_HW_CPU_CYCLES on most CPU. perf_events subsystem in Linux kernel will map COUNT_HW_CPU_CYCLES to some raw events more suitable to currently used CPU and its PMU.

    Depending on your goals you should try to get some statistics on PERF_COUNT_HW_INSTRUCTIONS values for your code fragment. You can also check stability of this counter with several runs of perf stat with some simple program:

    perf stat -e cycles:u,instructions:u /bin/echo 123
    perf stat -e cycles:u,instructions:u /bin/echo 123
    perf stat -e cycles:u,instructions:u /bin/echo 123
    

    Or use integrated repeat function of perf stat:

    perf stat --repeat 10 -e cycles:u,instructions:u /bin/echo 123
    

    I have +-10 instructions events variation (less than 0.1%) for 200 thousands total instructions executed, so it is very stable. For cycles I have 5% variation, so it should be cycles event marked with careful warning.