Search code examples
linuxlinux-kernelprofilingperformancecounterperf

Perf instruction/cycles count in userspace/kernelspace alone in Linux


I'm trying to profile an application which has both userspace and kernelspace code using perf. I tried every other possibility enabling various kernel configurations but I'm unable to get the instructions/cycles count which are in userspace/kernelspace alone. I tried using the ":u" and ":k extensions to instructions and cycles count, but all I get as reply is

$ perf stat -e cycles:u,instructions:u ls

 Performance counter stats for 'ls':

   <not supported>      cycles:u

   <not supported>      instructions:u

       0.006047045 seconds time elapsed

       0.000000000 seconds user
       0.008098000 seconds sys

However, running just for cycles/instructions gives a proper result something like below.

$ perf stat -e cycles,instructions ls

 Performance counter stats for 'ls':

          5362086      cycles
            528783      instructions              #    0.10  insn per cycle

       0.005487940 seconds time elapsed

       0.007800000 seconds user
       0.000000000 seconds sys

Note: ls is just used as an example here to highlight the issue.

I'm running Linux 5.4 and perf version 5.4.77.g1206eede9156. And, I'm running the above command on ARM board. Below are the configurations that I've enabled in the Linux kernel

CONFIG_PERF_EVENTS=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_KPROBES=y
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_FRAME_POINTER=y
CONFIG_FTRACE=y
CONFIG_KPROBE_EVENTS=y
CONFIG_UPROBE_EVENTS=y
CONFIG_PROBE_EVENTS=y

Further, perf list on the command line lists hardware/software events and many more

$ perf list
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]
  duration_time                                      [Tool event]
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-prefetch-misses                          [Hardware cache event]
  L1-dcache-prefetches                               [Hardware cache event]
  L1-dcache-store-misses                             [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  L1-icache-loads                                    [Hardware cache event]
  L1-icache-prefetch-misses                          [Hardware cache event]
  L1-icache-prefetches                               [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]

Kindly suggest how to fix this issue. Am I doing anything wrong?


Solution

  • Works for me, 444,022 cycles:u for perf stat -e cycles:u ls. perf version 5.13.g62fb9874f5da, on Linux 5.12.15-arch1-1, on bare metal (x86-64 Skylake), with perf_event_paranoid=0.
    (With modern perf you can also use perf stat --all-user to imply :u for all events.)

    I'm guessing your ARM CPU's hardware perf counters don't support being programmed with a mask for privilege-level, so perf reports that there is no hardware counter capable of counting only user-space instructions.

    AFAIK, there aren't hooks at every interrupt entry point to enable / disable HW counters; counting only kernel, only user, or both, is purely a hardware feature.

    HW support is obviously essential for accurate counts, because in a software implementation the counters would still be counting until kernel code ran that saved the current counts. (And kernel code after restoring the state, before returning to user-space.) Also, it would make every interrupt and system call even more expensive, instead of only virtualizing perf counters by saving/restoring them every context switch between tasks/threads. So there are good reasons for the kernel not to support a loose attempt to do it in software even on CPUs that don't have HW support for a privilege mask.