c++caching compiler-optimization static-functions

Are arguments loaded into the cache for empty functions?

I know that C++ compilers optimize empty (static) functions.

Based on that knowledge I wrote a piece of code that should get optimized away whenever I some identifier is defined (using the -D option of the compiler). Consider the following dummy example:

#include <iostream>

#ifdef NO_INC

struct T {
    static inline void inc(int& v, int i) {}
};

#else

struct T {
    static inline void inc(int& v, int i) {
        v += i;
    }
};

#endif

int main(int argc, char* argv[]) {
    int a = 42;

    for (int i = 0; i < argc; ++i)
        T::inc(a, i);

    std::cout << a;
}

The desired behavior would be the following: Whenever the NO_INC identifier is defined (using -DNO_INC when compiling), all calls to T::inc(...) should be optimized away (due to the empty function body). Otherwise, the call to T::inc(...) should trigger an increment by some given value i.

I got two questions regarding this:

Is my assumption correct that calls to T::inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized?
I wonder if the variables (a and i) are still loaded into the cache when T::inc(a, i) is called (assuming they are not there yet) although the function body is empty.

Thanks for any advice!

Solution

Compiler Explorer is an very useful tool to look at the assembly of your generated program, because there is no other way to figure out if the compiler optimized something or not for sure. Demo.

With actually incrementing, your main looks like:

main:                                   # @main
        push    rax
        test    edi, edi
        jle     .LBB0_1
        lea     eax, [rdi - 1]
        lea     ecx, [rdi - 2]
        imul    rcx, rax
        shr     rcx
        lea     esi, [rcx + rdi]
        add     esi, 41
        jmp     .LBB0_3
.LBB0_1:
        mov     esi, 42
.LBB0_3:
        mov     edi, offset std::cout
        call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
        xor     eax, eax
        pop     rcx
        ret

As you can see, the compiler completely inlined the call to T::inc and does the incrementing directly.

For an empty T::inc you get:

main:                                   # @main
        push    rax
        mov     edi, offset std::cout
        mov     esi, 42
        call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
        xor     eax, eax
        pop     rcx
        ret

The compiler optimized away the entire loop!

Is my assumption correct that calls to t.inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized?

Yes.

If my assumption holds, does it also hold for more complex function bodies (in the #else branch)?

No, for some definition of "complex". Compilers use heuristics to determine whether it's worth it to inline a function or not, and bases its decision on that and on nothing else.

I wonder if the variables (a and i) are still loaded into the cache when t.inc(a, i) is called (assuming they are not there yet) although the function body is empty.

No, as demonstrated above, the loop doesn't even exist.