Search code examples
c++profilingvalgrindcallgrind

Callgrind: Profile a specific part of my code


I'm trying to profile (with Callgrind) a specific part of my code by removing noise and computation that I don't care about. Here is an example of what I want to do:

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    //Method to be profiled with these data
    //Post operation on the data
}

My use-case is a regression test, I want to make sure that the method in question is still fast enough (something like less than 10% extra instructions since the last implementation). This is why I'd like to have the cleaner output form Callgrind. (I need a for loop in order to have a significant amount of data processed in order to have a good estimation of the behavior of the method I want to profile)

My first try was to change the code to:

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    CALLGRIND_START_INSTRUMENTATION;
    //Method to be profiled with these data
    CALLGRIND_STOP_INSTRUMENTATION;
    //Post operation on the data
}
CALLGRIND_DUMP_STATS;

Adding the Callgrind macros to control the instrumentation. I also added the --instr-atstart=no options to be sure that I profile only the part of the code I want...

Unfortunately with this configuration when I start to launch my executable with callgrind, it never ends... It is not a question of slowness, because a full instrumentation run last less than one minute.

I also tried

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    CALLGRIND_TOGGLE_COLLECT;
    //Method to be profiled with these data
    CALLGRIND_TOGGLE_COLLECT;
    //Post operation on the data
}
CALLGRIND_DUMP_STATS;

(or the --toggle-collect="myMethod" option) But Callgrind returned me a log without any call (KCachegrind is white as snow :( and says zero instructions...)

Did I use the macros/options correctly? Any idea of what I need to change in order to get the expected result?


Solution

  • I finally managed to solve this issue... This was a config issue:

    I kept the code

    for (int i=0; i<maxSample; ++i) {
        //Prepare data to be processed...
        CALLGRIND_TOGGLE_COLLECT;
        //Method to be profiled with these data
        CALLGRIND_TOGGLE_COLLECT;
        //Post operation on the data
    }
    CALLGRIND_DUMP_STATS;
    

    But ran the callgrind with --collect-atstart=no (and without the --instr-atstart=no!!!) and it worked perfectly, in a reasonable time (~1min).

    The issue with START/STOP instrumentation was that callgrind dumps a file (callgrind.out.#number) at each iteration (each STOP) thus it was really really slow... (after 5min I had only 5000 runs for a 300 000 iterations benchmark... unsuitable for a regression test).