Search code examples
c++performancefunction-pointersinline-functions

Function pointer runs faster than inline function. Why?


I ran a benchmark of mine on my computer (Intel i3-3220 @ 3.3GHz, Fedora 18), and got very unexpected results. A function pointer was actually a bit faster than an inline function.

Code:

#include <iostream>
#include <chrono>
inline short toBigEndian(short i)
{
    return (i<<8)|(i>>8);
}
short (*toBigEndianPtr)(short i)=toBigEndian;
int main()
{  
    std::chrono::duration<double> t;
    int total=0;
    for(int i=0;i<10000000;i++)
    {
        auto begin=std::chrono::high_resolution_clock::now();
        short a=toBigEndian((short)i);//toBigEndianPtr((short)i);
        total+=a;
        auto end=std::chrono::high_resolution_clock::now();
        t+=std::chrono::duration_cast<std::chrono::duration<double>>(end-begin);
    }
    std::cout<<t.count()<<", "<<total<<std::endl;
    return 0;
}

compiled with

g++ test.cpp -std=c++0x -O0

The 'toBigEndian' loop finishes always at around 0.26-0.27 seconds, while 'toBigEndianPtr' takes 0.21-0.22 seconds.

What makes this even more odd is that when I remove 'total', the function pointer becomes the slower one at 0.35-0.37 seconds, while the inline function is at about 0.27-0.28 seconds.

My question is:

Why is the function pointer faster than the inline function when 'total' exists?


Solution

  • Oh s**t (do I need to censor swearing here?), I found it out. It was somehow related to the timing being inside the loop. When I moved it outside as following,

    #include <iostream>
    #include <chrono>
    inline short toBigEndian(short i)
    {
        return (i<<8)|(i>>8);
    }
    
    short (*toBigEndianPtr)(short i)=toBigEndian;
    int main()
    {  
        int total=0;
        auto begin=std::chrono::high_resolution_clock::now();
        for(int i=0;i<100000000;i++)
        {
            short a=toBigEndianPtr((short)i);
            total+=a;
        }
        auto end=std::chrono::high_resolution_clock::now();
        std::cout<<std::chrono::duration_cast<std::chrono::duration<double>>(end-begin).count()<<", "<<total<<std::endl;
        return 0;
    }
    

    the results are just as they should be. 0.08 seconds for inline, 0.20 seconds for pointer. Sorry for bothering you guys.