Search code examples

perf top result about nested functions

We are using perf top to show the CPU usage. The result shows two functions

samples    pcnt    function
------     ----    ---------
...        ...     ....
12617.00   6.8%    func_outside
 8691.00   4.7%    func_inside

In fact, these two functions are nested like this, and always 1 to 1 nested.

func_outside() {

Should I conclude that in the perf top result, the 4.7% is actually already included in the 6.8%. And if excluding the cost of func_inside, the func_outside cost 2.1% (6.8-4.7)?


  • Short Answer

    No each percentage that is reported is for that specific function only. So the func_inside samples are not counted in func_outside


    The way perf works is that it periodically collects performance samples. By default perf top simply checks which function is currently running and then adds that to the sample count for this function.

    I was pretty sure this is the case, but wanted to verify that this is how perf top displays the results so I wrote a quick test program to test its behavior. This program has two functions of interest outer and inner. The outer function calls inner in a loop, and the amount of work that inner does is controlled by an argument. When compiling be sure to use O0 to avoid inlining. The command line arguments control the ratio of work between the two functions.

    Running with parameters ./a.out 1 1 1000000000 gives results:

    49.20%  a.out             [.] outer    
    23.69%  a.out             [.] main    
    21.32%  a.out             [.] inner    

    Running with parameters ./a.out 1 10 1000000000 gives results:

    66.06%  a.out             [.] inner    
    17.77%  a.out             [.] outer    
     9.50%  a.out             [.] main    

    Running with parameters ./a.out 1 100 1000000000 gives results:

    88.53%  a.out             [.] inner    
     2.85%  a.out             [.] outer    
     1.09%  a.out             [.] main    

    If the count for inner was included in outer then the runtime percentage for outer would always be higher than inner. But as these results show that is not the case.

    The test program I used is below and was compiled with gcc -O0 -g --std=c11 test.c.

    #include <stdlib.h>
    #include <stdio.h>
    long inner(int count) {
      long sum = 0;
      for(int i = 0; i < count; i++) {
        sum += i;
      return sum;
    long outer(int count_out, int count_in) {
      long sum = 0;
      for(int i = 0; i < count_out; i++) {
        sum += inner(count_in);
      return sum;
    int main(int argc, char **argv)  {
      if(argc < 4) {
        printf("Usage: %s <outer_cnt> <inner_cnt> <loop>\n",argv[0]);
      int outer_cnt = atoi(argv[1]);
      int inner_cnt = atoi(argv[2]);
      int loops     = atoi(argv[3]);
      long res = 0;
      for(int i = 0; i < loops; i++) {
        res += outer(outer_cnt, inner_cnt);
      printf("res is %ld\n", res);
      return 0;