I know that C++ compilers optimize empty (static) functions.
Based on that knowledge I wrote a piece of code that should get optimized away whenever I some identifier is defined (using the -D
option of the compiler).
Consider the following dummy example:
#include <iostream>
#ifdef NO_INC
struct T {
static inline void inc(int& v, int i) {}
};
#else
struct T {
static inline void inc(int& v, int i) {
v += i;
}
};
#endif
int main(int argc, char* argv[]) {
int a = 42;
for (int i = 0; i < argc; ++i)
T::inc(a, i);
std::cout << a;
}
The desired behavior would be the following:
Whenever the NO_INC
identifier is defined (using -DNO_INC
when compiling), all calls to T::inc(...)
should be optimized away (due to the empty function body). Otherwise, the call to T::inc(...)
should trigger an increment by some given value i
.
I got two questions regarding this:
T::inc(...)
do not affect the performance negatively when I specify the -DNO_INC
option because the call to the empty function is optimized?a
and i
) are still loaded into the cache when T::inc(a, i)
is called (assuming they are not there yet) although the function body is empty.Thanks for any advice!
Compiler Explorer is an very useful tool to look at the assembly of your generated program, because there is no other way to figure out if the compiler optimized something or not for sure. Demo.
With actually incrementing, your main
looks like:
main: # @main
push rax
test edi, edi
jle .LBB0_1
lea eax, [rdi - 1]
lea ecx, [rdi - 2]
imul rcx, rax
shr rcx
lea esi, [rcx + rdi]
add esi, 41
jmp .LBB0_3
.LBB0_1:
mov esi, 42
.LBB0_3:
mov edi, offset std::cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
xor eax, eax
pop rcx
ret
As you can see, the compiler completely inlined the call to T::inc
and does the incrementing directly.
For an empty T::inc
you get:
main: # @main
push rax
mov edi, offset std::cout
mov esi, 42
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
xor eax, eax
pop rcx
ret
The compiler optimized away the entire loop!
Is my assumption correct that calls to
t.inc(...)
do not affect the performance negatively when I specify the-DNO_INC
option because the call to the empty function is optimized?
Yes.
If my assumption holds, does it also hold for more complex function bodies (in the
#else
branch)?
No, for some definition of "complex". Compilers use heuristics to determine whether it's worth it to inline a function or not, and bases its decision on that and on nothing else.
I wonder if the variables (
a
andi
) are still loaded into the cache whent.inc(a, i)
is called (assuming they are not there yet) although the function body is empty.
No, as demonstrated above, the loop doesn't even exist.