Getting consistent callgrind output over multiple runs

I've been using valgrind to do some profiling on a C++ codebase, and based on my understanding it's performing sampling of individual calls to build its profile data. As a result, I can profile the same application twice using something like this:

valgrind --tool=callgrind myprogram.exe

and get different results each time. For instance, my last two runs I did gave me the following output at the bottom:

==70741== 
==70741== Events    : Ir
==70741== Collected : 196823780
==70741== 
==70741== I   refs:      196,823,780

and

==70758== 
==70758== Events    : Ir
==70758== Collected : 195728098
==70758== 
==70758== I   refs:      195,728,098

this all makes complete sense to me. However, I'm currently in the process of optimizing my codebase and I'd like to be able to make a change and determine if it does improve performance. Due to sampling, it seems that running callgrind alone will not be sufficient since I can get different numbers on each run. As a result, it would be difficult to determine if my latest run just ran faster just due to random sampling, or my change actually made a significant difference (in the statistical sense, why not?).

My question is: Is there a way I can force callgrind to be consistent in it's sampling? Or, is there some more appropriate higher-level tool that would allow me to understand how a refactor affects performance? I'm currently developing in Mac OSX Sierra.

Solution

callgrind does not use sampling technique. Instead, it exactly "counts" the instructions that are executed. So, if a program is run twice under callgrind and does exactly the same, then it will give the same result. However, as soon as your program is doing non trivial things (e.g. uses a lot of libraries), such libraries might do slightly different things, depending e.g. on the clock or system load or content of env or ...

So, the differences you see are not due to callgrind, it is because really your program and/or the libraries it is using are each time doing slightly different things.

You can use some callgrind output file visualisation or reporting tools, such as kcachegrind to analyse the difference between 2 runs. You might then see what is the origin of these differences and eliminate them. Otherwise, you should be able to determine the effect of your changes by only looking at the cost of the functions you are interested in.