I am trying to profile my application 20 seconds after starting because I'm not interested in the initialization stage.
So I used perf
with the delay (microseconds) switch:
sudo perf stat --delay=20000 -e page-faults my_app
Initially I was getting the error:
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
So I tried this:
echo 0 | sudo tee /proc/sys/kernel/nmi_watchdog
The error disappeared but Perf isn't counting the event:
<not counted> page-faults
I then tried again, removing the delay switch:
sudo perf stat -e page-faults my_app
and it works.
When using the delay switch, it does output events enabled
.
Am I using this switch wrong/how can I begin profiling after N seconds?
Works for me with perf version 6.3-4
on x86-64 Arch GNU/Linux, with Linux kernel 6.4.9-arch1-1. In case the CPU matters for software events, it's an i7-6700k Skylake.
Did your code exit before the 20 second delay was over? In that case, no events get counted for any counter. I can reproduce your symptoms with a busy loop that exits after about 3.6 seconds on my system:
perf stat --delay=20000 -e task-clock,cycles,page-faults \
awk 'BEGIN{for(i=0;i<100000000;i++){}}'
It prints Events disabled
, then Events enabled
right as it's exiting, but <not counted>
for both the hardware (cycles
) and software (task-clock and page-faults) events.
Reducing the delay to 2000
ms (2 seconds), it does count for the second ~half of the 3.6 second run time:
$ perf stat --delay=2000 -e task-clock,cycles,page-faults awk 'BEGIN{for(i=0;i<100000000;i++){}}'
Events disabled
Events enabled
Performance counter stats for 'awk BEGIN{for(i=0;i<100000000;i++){}}':
1,591.87 msec task-clock # 1.000 CPUs utilized
6,208,241,402 cycles # 3.900 GHz
6 page-faults # 3.769 /sec
1.591902071 seconds time elapsed
3.590486000 seconds user
0.000000000 seconds sys
awk
is single-threaded, but note the 1.59 sec elapsed vs. 3.59 sec "user" time: 2 seconds less, the time during which it wasn't counting. I guess the user/sys times come from OS per-process counters, rather than perf
looking at the wall-clock.