Search code examples
profilingvalgrindcallgrind

valgrind 'callgrind' instruction fetch differs between runs


I'm currently profiling C++ executable to determine if code change affects performance. For profiling I am using 'valgrind/callgrind'.

I can't find any documentation, why 'Ir' (instructions read) between identical executable runs under test produces different result numbers:

1st run:

==2800==
==2800== Events    : Ir
==2800== Collected : 68723519295
==2800==
==2800== I   refs:      68,723,519,295

2nd run:

==2821==
==2821== Events    : Ir
==2821== Collected : 68723289248
==2821==
==2821== I   refs:      68,723,289,248

I think this is due to CPU optimizations and is expected, but I would like to reference & confirm this. If anyone knows answers to the following questions and have reference details I would be grateful:

  • Is it expected to have a different 'Ir' for each execution?
  • If it is expected, the code change 'Ir' number can't determine how faster/slower update is? Or it should be only estimate of improvement (e.g. if it is about 15% less 'Ir', then it is faster etc?)
  • Any options could be used within valgrind/callgrind to minimise, or perhaps even to make 'Ir' number consistent between test runs?

Thanks,


Solution

  • This is most likely due to timing sensitive differences.

    For instance, if your application has a 100ms timer then the number of times that the timer will fire (and the signal handler execute) may depend on things like the load on the network and the disk.

    You need to see the breakdown by function to see where the difference is coming from.