Search code examples
perf

perf reports multiple sched:sched_stat_sleep event for a single sleep


I was trying to profile sample application

#include<stdio.h>
#include<unistd.h>

int calculate() {
        int i=0, ret;
        for(i=0;i<10000000;i++) {
                ret +=i;
        }
        return ret;
}

int main(int argc, char ** ch) {
        int loop;
        for(loop=0; loop<5; loop++){
                sleep(1);
                calculate();
        }
}

I used perf record to profile.

perf record -e sched:sched_stat_sleep -e sched:sched_switch -o perfnoallcpu.data -g ./loop

I see that for each switch there are 4 sched:sched_stat_sleep reported.

 swapper     0/0     [000]   222.777113:    1269841 sched:sched_stat_sleep: comm=perf pid=2927 delay=1269841 [ns]
         swapper     0/0     [000]   222.777116:    1269841 sched:sched_stat_sleep: comm=perf pid=2927 delay=1269841 [ns]
         swapper     0/0     [000]   222.777117:    1269841 sched:sched_stat_sleep: comm=perf pid=2927 delay=1269841 [ns]
         swapper     0/0     [000]   222.777118:    1269841 sched:sched_stat_sleep: comm=perf pid=2927 delay=1269841 [ns]
            loop  2927/2927  [001]   222.777651:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   223.777227:  999577599 sched:sched_stat_sleep: comm=loop pid=2927 delay=999577599 [ns]
         swapper     0/0     [001]   223.777230:  999577599 sched:sched_stat_sleep: comm=loop pid=2927 delay=999577599 [ns]
         swapper     0/0     [001]   223.777231:  999577599 sched:sched_stat_sleep: comm=loop pid=2927 delay=999577599 [ns]
         swapper     0/0     [001]   223.777232:  999577599 sched:sched_stat_sleep: comm=loop pid=2927 delay=999577599 [ns]
            loop  2927/2927  [001]   223.803313:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   224.802882:  999568643 sched:sched_stat_sleep: comm=loop pid=2927 delay=999568643 [ns]
         swapper     0/0     [001]   224.802884:  999568643 sched:sched_stat_sleep: comm=loop pid=2927 delay=999568643 [ns]
         swapper     0/0     [001]   224.802885:  999568643 sched:sched_stat_sleep: comm=loop pid=2927 delay=999568643 [ns]
         swapper     0/0     [001]   224.802885:  999568643 sched:sched_stat_sleep: comm=loop pid=2927 delay=999568643 [ns]
            loop  2927/2927  [001]   224.814909:  999483089 sched:sched_stat_sleep: comm=kworker/1:1 pid=65 delay=999483089 [ns]
            loop  2927/2927  [001]   224.814913:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=65 next_prio=120
            loop  2927/2927  [001]   224.828492:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   225.828066:  999572687 sched:sched_stat_sleep: comm=loop pid=2927 delay=999572687 [ns]
         swapper     0/0     [001]   225.828069:  999572687 sched:sched_stat_sleep: comm=loop pid=2927 delay=999572687 [ns]
         swapper     0/0     [001]   225.828070:  999572687 sched:sched_stat_sleep: comm=loop pid=2927 delay=999572687 [ns]
         swapper     0/0     [001]   225.828071:  999572687 sched:sched_stat_sleep: comm=loop pid=2927 delay=999572687 [ns]
            loop  2927/2927  [001]   225.829873:  301152896 sched:sched_stat_sleep: comm=compiz pid=1723 delay=301152896 [ns]
            loop  2927/2927  [001]   225.829877:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=R ==> next_comm=compiz next_pid=1723 next_prio=120
            loop  2927/2927  [001]   225.853764:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   226.853341:  999576357 sched:sched_stat_sleep: comm=loop pid=2927 delay=999576357 [ns]
         swapper     0/0     [001]   226.853343:  999576357 sched:sched_stat_sleep: comm=loop pid=2927 delay=999576357 [ns]
         swapper     0/0     [001]   226.853344:  999576357 sched:sched_stat_sleep: comm=loop pid=2927 delay=999576357 [ns]
         swapper     0/0     [001]   226.853345:  999576357 sched:sched_stat_sleep: comm=loop pid=2927 delay=999576357 [ns]
            loop  2927/2927  [001]   226.878867:          1 sched:sched_switch: prev_comm=loop prev_pid=2927 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
         swapper     0/0     [001]   227.878442:  999573760 sched:sched_stat_sleep: comm=loop pid=2927 delay=999573760 [ns]
         swapper     0/0     [001]   227.878443:  999573760 sched:sched_stat_sleep: comm=loop pid=2927 delay=999573760 [ns]
         swapper     0/0     [001]   227.878444:  999573760 sched:sched_stat_sleep: comm=loop pid=2927 delay=999573760 [ns]
         swapper     0/0     [001]   227.878445:  999573760 sched:sched_stat_sleep: comm=loop pid=2927 delay=999573760 [ns]

number of sched_stat_sleep increases with number of cpus. This makes the application appear to have slept for more time.

If i specify -a option when doing perf record the issue is gone, we see only one sched_stat_sleep event per sleep.


Solution

  • This is a bug in the kernel, which affects also other events (the message from chengjian (D) shows that it affects sched:sched_stat_sleep). Apparently, the event is being counted as many times as there are CPUs (nrcpus). According to this diff, the fix is included in the kernel version 4.18.17 (which ensures that the event is only counted for the CPU on which it has occurred.