perf top result about nested functions

We are using perf top to show the CPU usage. The result shows two functions

samples    pcnt    function
------     ----    ---------
...        ...     ....
12617.00   6.8%    func_outside
 8691.00   4.7%    func_inside
.....

In fact, these two functions are nested like this, and always 1 to 1 nested.

func_outside() {
  ....
  func_inside() 
  ... 
}

Should I conclude that in the perf top result, the 4.7% is actually already included in the 6.8%. And if excluding the cost of func_inside, the func_outside cost 2.1% (6.8-4.7)?

Solution

Short Answer

No each percentage that is reported is for that specific function only. So the func_inside samples are not counted in func_outside

Details

The way perf works is that it periodically collects performance samples. By default perf top simply checks which function is currently running and then adds that to the sample count for this function.

I was pretty sure this is the case, but wanted to verify that this is how perf top displays the results so I wrote a quick test program to test its behavior. This program has two functions of interest outer and inner. The outer function calls inner in a loop, and the amount of work that inner does is controlled by an argument. When compiling be sure to use O0 to avoid inlining. The command line arguments control the ratio of work between the two functions.

Running with parameters ./a.out 1 1 1000000000 gives results:

49.20%  a.out             [.] outer    
23.69%  a.out             [.] main    
21.32%  a.out             [.] inner

Running with parameters ./a.out 1 10 1000000000 gives results:

66.06%  a.out             [.] inner    
17.77%  a.out             [.] outer    
 9.50%  a.out             [.] main

Running with parameters ./a.out 1 100 1000000000 gives results:

88.53%  a.out             [.] inner    
 2.85%  a.out             [.] outer    
 1.09%  a.out             [.] main

If the count for inner was included in outer then the runtime percentage for outer would always be higher than inner. But as these results show that is not the case.

The test program I used is below and was compiled with gcc -O0 -g --std=c11 test.c.

#include <stdlib.h>
#include <stdio.h>

long inner(int count) {
  long sum = 0;
  for(int i = 0; i < count; i++) {
    sum += i;
  }
  return sum;

}

long outer(int count_out, int count_in) {
  long sum = 0;
  for(int i = 0; i < count_out; i++) {
    sum += inner(count_in);
  }
  return sum;
}

int main(int argc, char **argv)  {
  if(argc < 4) {
    printf("Usage: %s <outer_cnt> <inner_cnt> <loop>\n",argv[0]);
    exit(-1);
  }

  int outer_cnt = atoi(argv[1]);
  int inner_cnt = atoi(argv[2]);
  int loops     = atoi(argv[3]);

  long res = 0;
  for(int i = 0; i < loops; i++) {
    res += outer(outer_cnt, inner_cnt);
  }

  printf("res is %ld\n", res);
  return 0;
}