This is a general efficiency question for c++. I am not familiar with the inner workings of compilers, so suppose I have several loops and a potential if statement inside, e.g.:
for(int i=0; ...)
{
for(int j=0; ...)
{
if( ... )
{
...
}
else
{
... (slightly different)
}
}
}
However, this if-statement is independent of the loops. Is there a significant speed difference if I instead define the if/else statement outside of the loops with the loops inside? E.g.:
if( ... )
{
for(int i=0; ...)
{
for(int j=0; ...)
{
...
}
}
}
else
{
for(int i=0; ...)
{
for(int j=0; ...)
{
... (slightly different)
}
}
}
If so, or if not so, why is that? I have some notion that a compiler will recognize the same if statement being done over and over, but this is quite unfamiliar territory to me.
I examined the response to this question: Would compiler optimize conditional statement in loop by moving it ouside the loop? and he discusses the different levels of optimization in gcc, and how -O3 (I think) would do that. But is anything done like this automatically? If not, how big of a cost is an if-statement like this inside of a loop?
The only real answer is maybe. If the condition is a loop
invariant, then the transposition you suggest is legal, and if
the compiler can recognize the loop invariance, then it can make
the transposition. Whether it does or not depends on the
compiler: g++ /O3
does, at least in 64 bit mode, cl /Ox /Os
doesn't, at least in 32 bit mode; g++ also unrolls the two loops.
In my tests, at least; I more or less guaranteed that the
compiler could determine that the condition was a loop invariant
by wrapping the loop in a function, with the condition
a function argument of type bool const
; depending on the
condition, it may be more or less difficult for the compiler to
prove loop invariance. And of course, the fact that the
compiler has more registers to play with in 64 bit mode could
also affect its optimizations .
Also: although I'd instinctively expect the g++ version to be faster, it is significantly larger; in some cases, this may negatively affect the various memory caches, resulting in the code actually running slower.
In the end, I'd write the first, always. If the profiler later shows it to be a bottleneck, there's no issue about going back and rewriting it along the lines of the second, then measuring to see if it makes a difference, one way or the other, and how much difference it makes. And be aware that the best results may depend on the compiler and the architecture you are targetting.