performance profiling performance-testing sampling gprof

Do (sampling) profilers still "lie" these days?

Most of my limited experience with profiling native code is on a GPU rather than on a CPU, but I see some CPU profiling in my future...

Now, I've just read this blog post:

How profilers lie: The case of gprof and KCacheGrind

about how what profilers measure and what they show you, which is likely not what you expect if you're interested in discerning between different call paths and the time spent in them.

My question is: Is this still the case today (5 years later)? That is, do sampling profilers (i.e. those who don't slow execution down terribly) still behave the way gprof used to (or callgrind without --separate-callers=N)? Or do profilers nowadays customarily record the entire call stack when sampling?

Solution

Regarding gprof, yes, it's still the case today. This is by design, to keep the profiling overhead small. From the up-to-date documentation:

Some of the figures in the call graph are estimates—for example, the children time values and all the time figures in caller and subroutine lines.

There is no direct information about these measurements in the profile data itself. Instead, gprof estimates them by making an assumption about your program that might or might not be true.

The assumption made is that the average time spent in each call to any function foo is not correlated with who called foo. If foo used 5 seconds in all, and 2/5 of the calls to foo came from a, then foo contributes 2 seconds to a’s children time, by assumption.

Regarding KCacheGrind, little has changed since the article was written. You can check out the change log and see that the latest version was published in April 5, 2013, which includes unrelated changes. You can also refer to Josef Weidendorfer's comments under the article (Josef is the author of KCacheGrind).