Search code examples
goprofilingpprof

What does pprof -call_tree do?


go tool pprof has a -call_tree option that, according to -help, should “Create a context-sensitive call tree”. However, pprof -tree on a CPU profile gives me the exact same output with and without this option. It looks like this (one representative node):

      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                             0.07s 35.00% |   google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders
                                             0.04s 20.00% |   golang.org/x/net/http2.(*Framer).readMetaFrame
                                             0.02s 10.00% |   github.com/Shopify/sarama.(*FetchResponse).decode
     0.06s  0.79% 51.18%      0.20s  2.63%                | runtime.mapassign_faststr
                                             0.05s 25.00% |   runtime.newobject (inline)
                                             0.03s 15.00% |   aeshashbody
                                             0.03s 15.00% |   runtime.mallocgc

This is clearly not a tree, not least because it shows Sarama under HTTP/2 — consuming from Kafka (via Sarama) and serving HTTP/2 are two independent things this process does concurrently.

Why doesn’t -call_tree affect this output of -tree? In general, what does -call_tree do?

Extra credit

What is the exact meaning of the output node I showed above? Does it mean that 35% of samples containing mapassign_faststr also contained operateHeaders somewhere up the stack, and 10% contained decode? What about the lines below, such as mallocgc?

What documents could I read to answer the above questions?


Solution

  • Why doesn’t -call_tree affect this output of -tree

    I think the -call_tree option doesn't change the output of -tree the outputs are not actually a tree, it outputs the nodes of the tree(more on this in the extra credit section).


    In general, what does -call_tree do?

    You can see a difference when you take a look at images generated with the -png flag, without the -call_tree flag: enter image description here And with the -call_tree flag: enter image description here

    So instead of having 1 call tree, pprof attempts to create seperate trees based on context. In my case(will list them since the text on the image is not readable) the roots are:

    • testing.(*B).launch (benchmark/test framework)
    • runtime.gcBgMarkWorker (part of the runtime GC)
    • runtime.bgsweep (part of the runtime GC)
    • runtime.mcall (part of the runtime scheduler)
    • runtime.morestack (something to do with the stack :) )

    In the non -call_tree image these nodes are still present, but start mid tree, as if our code calls these background processed directly.

    Basically basically what the option does is remove/hide infrequent calls between functions, so you will end up with a tree for each set of functions that call each other frequently.

    I haven't tested this but I imagine that pprof will do this context aware tree separation for user code as well. All in all it returns a subjectively more readable tree, or at least a more relevant one.


    What is the exact meaning of the output node I showed above?

    The -tree option attempts to output the tree, as shows in the images. But since it is text output it shows you 1 node of the tree at a time, the non indented line in the context column is the current node, the rows above are nodes that call the current node, the rows below the nodes which this one calls (the arrows in the image).

    The calls% is a the "weight" of the the incoming or outgoing edge, so indeed the percentage of calls from or to a function.


    What documents could I read to answer the above questions?

    If figured all of this out by looking at the source code, here are some key parts, in case you are interested: