Search code examples
c++performancegcc4

g++, doubles, optimization and a big WTF


bug in my gcc? bug in my code? both?

http://files.minthos.com/code/speedtest_doubles_wtf.cpp

Somehow, it manages to "optimize" a function that results in the array of doubles being zeroed out into taking 2.6 seconds on my q6600, instead of the 33 ms the more complex function takes to fill the array with something somewhat meaningful.

I'd be interested in knowing if others get similar results, and if so, if anyone can explain what's going on.. And also figure out what causes the huge difference between integer and floating-point performance (especially when compiling without optimization).


Solution

  • Line 99:

    memcpy(floats, ints, sizeof(floats));
    

    is partially initializing floats[] effectively with floating point garbage. The rest remain zero. This stems from replacing the floats with integer bitmaps and then subsequently interpreting them as doubles. Perhaps the overflows and underflows are affecting performance? To test, I changed the random number seed to a constant 1000 for reproducibility and got these results:

    [wally@zenetfedora Downloads]$ ./speedtest_doubles_wtf.cpp
    no optimization
    begin: 0.017000
    floats: 27757.816000
    ints: 28117.604000
    floats: 40346.196000
    ints: 41094.988000
    sum: 7999999.998712
    sum2: 67031739228347449344.000000
    mild optimization
    begin: 0.014000
    floats: 68.574000
    ints: 68.609000
    floats: 147.105000
    ints: 820.609000
    sum: 8000000.000001
    sum2: 67031739228347441152.000000
    heavier optimization
    begin: 0.014000
    floats: 73.588000
    ints: 73.623000
    floats: 144.105000
    ints: 1809.980000
    sum: 8000000.000001
    sum2: 67031739228347441152.000000
    again, now using ffun2()
    no optimization
    begin: 0.017000
    floats: 22720.648000
    ints: 23076.134000
    floats: 35480.824000
    ints: 36229.484000
    floats: 46324.080000
    sum: 0.000000
    sum2: 67031739228347449344.000000
    mild optimization
    begin: 0.013000
    floats: 69.937000
    ints: 69.967000
    floats: 138.010000
    ints: 965.654000
    floats: 19096.902000
    sum: 0.000000
    sum2: 67031739228347441152.000000
    heavier optimization
    begin: 0.015000
    floats: 95.851000
    ints: 95.896000
    floats: 206.594000
    ints: 1699.698000
    floats: 29382.348000
    sum: 0.000000
    sum2: 67031739228347441152.000000
    

    Repeating after replacing the memcpy with a proper assignment so type conversion can occur should prevent floating point boundary conditions:

    for(int i = 0; i < 16; i++)
    {
        ints[i] = rand();
        floats[i]= ints[i];
    }
    

    The modified program, still with constant 1000 as random seed, provides these results:

    [wally@zenetfedora Downloads]$ ./speedtest_doubles_wtf.cpp
    no optimization
    begin: 0.013000
    floats: 35814.832000
    ints: 36172.180000
    floats: 85950.352000
    ints: 86691.680000
    sum: inf
    sum2: 67031739228347449344.000000
    mild optimization
    begin: 0.013000
    floats: 33136.644000
    ints: 33136.678000
    floats: 51600.436000
    ints: 52494.104000
    sum: inf
    sum2: 67031739228347441152.000000
    heavier optimization
    begin: 0.013000
    floats: 31914.496000
    ints: 31914.540000
    floats: 48611.204000
    ints: 49971.460000
    sum: inf
    sum2: 67031739228347441152.000000
    again, now using ffun2()
    no optimization
    begin: 0.014000
    floats: 40202.956000
    ints: 40545.120000
    floats: 104679.168000
    ints: 106142.824000
    floats: 144527.936000
    sum: inf
    sum2: 67031739228347449344.000000
    mild optimization
    begin: 0.014000
    floats: 33365.716000
    ints: 33365.752000
    floats: 49180.112000
    ints: 50145.824000
    floats: 80342.648000
    sum: inf
    sum2: 67031739228347441152.000000
    heavier optimization
    begin: 0.014000
    floats: 31515.560000
    ints: 31515.604000
    floats: 47947.088000
    ints: 49016.240000
    floats: 78929.784000
    sum: inf
    sum2: 67031739228347441152.000000
    

    This is an older PC, circa 2004, otherwise lightly loaded.

    Looks like that made matters slower. Fewer zeroes to do arithmetic with perhaps? That is what many random bit patterns look like. Or values like 0.0000000000000000000000000382652. Once that is added to, say 0.1, the low bits tend to be removed.