Search code examples
linuxunixprofilingperf

Why does perf show that sleep takes all cores?


I am trying to familiarize myself with perf and run it against various programs I wrote.

When I launch it against program that is 100% single threaded, perf shows that it takes two cores on machine (task-clock event). Here's the example output:

perf stat  -a --per-core python3 test.py

Performance counter stats for 'system wide':

    S0-C0           1       19004.951263      task-clock (msec) # 1.000 CPUs utilized            (100.00%)
    S0-C0           1              5,582      context-switches                                              (100.00%)
    S0-C0           1                 19      cpu-migrations                                                (100.00%)
    S0-C0           1              3,746      page-faults                                                 
    S0-C0           1    <not supported>      cycles                   
    S0-C0           1    <not supported>      stalled-cycles-frontend  
    S0-C0           1    <not supported>      stalled-cycles-backend   
    S0-C0           1    <not supported>      instructions             
    S0-C0           1    <not supported>      branches                 
    S0-C0           1    <not supported>      branch-misses            
    S0-C1           1       19004.950059      task-clock (msec) # 1.000 CPUs utilized            (100.00%)
    S0-C1           1              6,752      context-switches                                              (100.00%)
    S0-C1           1                 25      cpu-migrations                                                (100.00%)
    S0-C1           1                935      page-faults                                                 
    S0-C1           1    <not supported>      cycles                   
    S0-C1           1    <not supported>      stalled-cycles-frontend  
    S0-C1           1    <not supported>      stalled-cycles-backend   
    S0-C1           1    <not supported>      instructions             
    S0-C1           1    <not supported>      branches                 
    S0-C1           1    <not supported>      branch-misses            

      19.004688019 seconds time elapsed

It even shows that simple sleep command takes two cores on my computer and I can't explain this. I understand that OS scheduler can reassign active core for any process, but in this case CPU utilization would reflect that.

Can anyone explain this?


Solution

  • According to man page of perf stat subocmmand, you have -a option to profile full system: http://man7.org/linux/man-pages/man1/perf-stat.1.html

       -a, --all-cpus
           system-wide collection from all CPUs (default if no target is
           specified)
    

    In this "system-wide" mode perf stat (and perf record too) will count events on (or profile for record) all CPUs in the system. When used without additional argument of command, perf will run until interrupted by Ctrl-C. With argument of command, perf will count/profile until the command works. Typical usage is

    perf stat -a sleep 10      # Profile counting every CPU for 10 seconds
    perf record -a sleep 10    # Profile with cycles every CPU for 10 seconds to perf.data
    

    For getting stats of single command use single process profiling (without -a option)

    perf stat python3 test.py
    

    For profiling (perf record) you may run without -a option; or you may use -a and later do some manual filtering in perf report, focusing only on the pids/tids/dsos of your application (This can be very useful if command to profile uses some interprocess requests to other daemons to do lot of CPU work).

    --per-core, -A, -C <cpulist>, --per-socket options are only for system-wide -a mode. Try --per-thread with -p pid attach to process option.