c++performance function-pointers inline-functions

Function pointer runs faster than inline function. Why?

I ran a benchmark of mine on my computer (Intel i3-3220 @ 3.3GHz, Fedora 18), and got very unexpected results. A function pointer was actually a bit faster than an inline function.

Code:

#include <iostream>
#include <chrono>
inline short toBigEndian(short i)
{
    return (i<<8)|(i>>8);
}
short (*toBigEndianPtr)(short i)=toBigEndian;
int main()
{  
    std::chrono::duration<double> t;
    int total=0;
    for(int i=0;i<10000000;i++)
    {
        auto begin=std::chrono::high_resolution_clock::now();
        short a=toBigEndian((short)i);//toBigEndianPtr((short)i);
        total+=a;
        auto end=std::chrono::high_resolution_clock::now();
        t+=std::chrono::duration_cast<std::chrono::duration<double>>(end-begin);
    }
    std::cout<<t.count()<<", "<<total<<std::endl;
    return 0;
}

compiled with

g++ test.cpp -std=c++0x -O0

The 'toBigEndian' loop finishes always at around 0.26-0.27 seconds, while 'toBigEndianPtr' takes 0.21-0.22 seconds.

What makes this even more odd is that when I remove 'total', the function pointer becomes the slower one at 0.35-0.37 seconds, while the inline function is at about 0.27-0.28 seconds.

My question is:

Why is the function pointer faster than the inline function when 'total' exists?

Solution

Oh s**t (do I need to censor swearing here?), I found it out. It was somehow related to the timing being inside the loop. When I moved it outside as following,

#include <iostream>
#include <chrono>
inline short toBigEndian(short i)
{
    return (i<<8)|(i>>8);
}

short (*toBigEndianPtr)(short i)=toBigEndian;
int main()
{  
    int total=0;
    auto begin=std::chrono::high_resolution_clock::now();
    for(int i=0;i<100000000;i++)
    {
        short a=toBigEndianPtr((short)i);
        total+=a;
    }
    auto end=std::chrono::high_resolution_clock::now();
    std::cout<<std::chrono::duration_cast<std::chrono::duration<double>>(end-begin).count()<<", "<<total<<std::endl;
    return 0;
}

the results are just as they should be. 0.08 seconds for inline, 0.20 seconds for pointer. Sorry for bothering you guys.