I have a code that needs to run fast and I am optimizing the heck out of the inner loop that is run several hundred trillion times.
In pursuit of this, I have been writing several different versions of the code in this inner loop, some using naive methods, some using SSE intrinsics, etc etc. I did all of this with the idea that when I run it it on a particular hardware combination I could run a test, see which implementation / compiler commands combination worked best and run it.
At first when it was only two different methods I used a simple conditional compilation inside the loop as follows
do
{
#ifdef naive_loop
//more code here
#endif
#ifdef partially_unrolled_loop
//more code here
#endif
}
while( runNumber < maxRun );
Later as the number of variations and different things I tried grew, it turned into this:
#ifdef naive_loop
void CalcRunner::loopFunction()
{
//code goes here
}
#endif
#ifdef partially_unrolled_loop
void CalcRunner::loopFunction()
{
//code goes here
}
#endif
#ifdef sse_intrinsics
void CalcRunner::loopFunction()
{
//code goes here
}
#endif
//etc
However this is making my file become enormous and annoying to read. Is there a more elegant way to do this?
You can use template and template specialization to do the job. For example:
template <typename T>
class CalcRunner;
template <>
class CalcRunner<naive_loop>
{
void loopFunction(void){...}
};
template <>
class CalcRunner<partially_unrolled_loop>
{
void loopFunction(void){...}
};
// Now instantiate what you wanna at compiler time
typename CalcRunner<partially_unrolled_loop> CalcRunner_t
int main()
{
CalcRunner_t runner;
runner.loopFunction();
}