I ran a benchmark of mine on my computer (Intel i3-3220 @ 3.3GHz, Fedora 18), and got very unexpected results. A function pointer was actually a bit faster than an inline function.
Code:
#include <iostream>
#include <chrono>
inline short toBigEndian(short i)
{
return (i<<8)|(i>>8);
}
short (*toBigEndianPtr)(short i)=toBigEndian;
int main()
{
std::chrono::duration<double> t;
int total=0;
for(int i=0;i<10000000;i++)
{
auto begin=std::chrono::high_resolution_clock::now();
short a=toBigEndian((short)i);//toBigEndianPtr((short)i);
total+=a;
auto end=std::chrono::high_resolution_clock::now();
t+=std::chrono::duration_cast<std::chrono::duration<double>>(end-begin);
}
std::cout<<t.count()<<", "<<total<<std::endl;
return 0;
}
compiled with
g++ test.cpp -std=c++0x -O0
The 'toBigEndian' loop finishes always at around 0.26-0.27 seconds, while 'toBigEndianPtr' takes 0.21-0.22 seconds.
What makes this even more odd is that when I remove 'total', the function pointer becomes the slower one at 0.35-0.37 seconds, while the inline function is at about 0.27-0.28 seconds.
My question is:
Why is the function pointer faster than the inline function when 'total' exists?
Oh s**t (do I need to censor swearing here?), I found it out. It was somehow related to the timing being inside the loop. When I moved it outside as following,
#include <iostream>
#include <chrono>
inline short toBigEndian(short i)
{
return (i<<8)|(i>>8);
}
short (*toBigEndianPtr)(short i)=toBigEndian;
int main()
{
int total=0;
auto begin=std::chrono::high_resolution_clock::now();
for(int i=0;i<100000000;i++)
{
short a=toBigEndianPtr((short)i);
total+=a;
}
auto end=std::chrono::high_resolution_clock::now();
std::cout<<std::chrono::duration_cast<std::chrono::duration<double>>(end-begin).count()<<", "<<total<<std::endl;
return 0;
}
the results are just as they should be. 0.08 seconds for inline, 0.20 seconds for pointer. Sorry for bothering you guys.