Search code examples
linuxlinux-kerneltraceftrace

Is there a way to output function durations with trace-cmd that is exclusive of nested function durations?


I am tracing a load generator with trace-cmd. Here is a snippet of output from my trace-cmd report:

     loadgen-2682  [003]   105.979473: funcgraph_entry:                   |              select_task_rq_fair() {
     loadgen-2682  [003]   105.979473: funcgraph_entry:                   |                select_idle_sibling() {
     loadgen-2682  [003]   105.979473: funcgraph_entry:        0.071 us   |                  idle_cpu();
     loadgen-2682  [003]   105.979474: funcgraph_entry:        0.111 us   |                  idle_cpu();
     loadgen-2682  [003]   105.979474: funcgraph_entry:        0.056 us   |                  idle_cpu();
     loadgen-2682  [003]   105.979475: funcgraph_entry:        0.056 us   |                  idle_cpu();
     loadgen-2682  [003]   105.979475: funcgraph_exit:         2.324 us   |                }
     loadgen-2682  [003]   105.979476: funcgraph_exit:         2.940 us   |              }
     loadgen-2682  [003]   105.979476: funcgraph_entry:                   |              set_task_cpu() {
     loadgen-2682  [003]   105.979476: funcgraph_entry:                   |                migrate_task_rq_fair() {
     loadgen-2682  [003]   105.979476: funcgraph_entry:        0.080 us   |                  remove_entity_load_avg();
     loadgen-2682  [003]   105.979477: funcgraph_exit:         0.537 us   |                }
     loadgen-2682  [003]   105.979477: funcgraph_entry:        0.042 us   |                set_task_rq_fair();
     loadgen-2682  [003]   105.979478: funcgraph_exit:         1.463 us   |              }

The durations of all functions look suspiciously like they are inclusive of time spent in nested function calls. For example, I believe select_idle_sibling()'s duration of 2.324 us is including the .294 us (total) spent inside of the idle_cpu() calls. Similarly, I believe select_task_rq_fair() is inclusive of the select_idle_sibling() duration.

What I am asking is if there is a way within trace-cmd to turn this off, such that select_task_rq_fair() would show its duration exclusive of any functions it has called; i.e. select_task_rq_fair() would show a duration of 0.616 us (2.940 us - 2.324 us). I was unable to locate an answer to this question in the manpage.


Solution

  • After having spent some time on this problem, I can say that the answer is almost certainly "no." If you want to calculate the time spent inside of a function which is exclusive of time spent in any nested functions within, your will need:

    • A stack which pushes funcgraph_entries that will have nested calls (i.e., funcgraph_entries which do not have a listed duration) and pops when it sees the entries' corresponding funcgraph_exit calls
    • A way to keep track of the durations of nested calls, so that you may subtract their summed durations from the duration listed on your target's funcgraph_exit line

    This can be done in a few ways, but a simple class might look something like:

    class FunctionDuration:
        def __init__(self, function_name):
            name = function_name
            nested_call_duration = 0
            duration = 0