We are using perf top
to show the CPU usage. The result shows two functions
samples pcnt function
------ ---- ---------
... ... ....
12617.00 6.8% func_outside
8691.00 4.7% func_inside
.....
In fact, these two functions are nested like this, and always 1 to 1 nested.
func_outside() {
....
func_inside()
...
}
Should I conclude that in the perf top
result, the 4.7% is actually already included in the 6.8%. And if excluding the cost of func_inside, the func_outside cost 2.1% (6.8-4.7)?
No each percentage that is reported is for that specific function only. So the func_inside
samples are not counted in func_outside
The way perf
works is that it periodically collects performance samples. By default perf top
simply checks which function is currently running and then adds that to the sample count for this function.
I was pretty sure this is the case, but wanted to verify that this is how perf top
displays the results so I wrote a quick test program to test its behavior. This program has two functions of interest outer
and inner
. The outer
function calls inner
in a loop, and the amount of work that inner
does is controlled by an argument. When compiling be sure to use O0 to avoid inlining. The command line arguments control the ratio of work between the two functions.
Running with parameters ./a.out 1 1 1000000000
gives results:
49.20% a.out [.] outer
23.69% a.out [.] main
21.32% a.out [.] inner
Running with parameters ./a.out 1 10 1000000000
gives results:
66.06% a.out [.] inner
17.77% a.out [.] outer
9.50% a.out [.] main
Running with parameters ./a.out 1 100 1000000000
gives results:
88.53% a.out [.] inner
2.85% a.out [.] outer
1.09% a.out [.] main
If the count for inner
was included in outer
then the runtime percentage for outer
would always be higher than inner
. But as these results show that is not the case.
The test program I used is below and was compiled with gcc -O0 -g --std=c11 test.c
.
#include <stdlib.h>
#include <stdio.h>
long inner(int count) {
long sum = 0;
for(int i = 0; i < count; i++) {
sum += i;
}
return sum;
}
long outer(int count_out, int count_in) {
long sum = 0;
for(int i = 0; i < count_out; i++) {
sum += inner(count_in);
}
return sum;
}
int main(int argc, char **argv) {
if(argc < 4) {
printf("Usage: %s <outer_cnt> <inner_cnt> <loop>\n",argv[0]);
exit(-1);
}
int outer_cnt = atoi(argv[1]);
int inner_cnt = atoi(argv[2]);
int loops = atoi(argv[3]);
long res = 0;
for(int i = 0; i < loops; i++) {
res += outer(outer_cnt, inner_cnt);
}
printf("res is %ld\n", res);
return 0;
}