Search code examples
c++cachingcompiler-optimizationstatic-functions

Are arguments loaded into the cache for empty functions?


I know that C++ compilers optimize empty (static) functions.

Based on that knowledge I wrote a piece of code that should get optimized away whenever I some identifier is defined (using the -D option of the compiler). Consider the following dummy example:

#include <iostream>

#ifdef NO_INC

struct T {
    static inline void inc(int& v, int i) {}
};

#else

struct T {
    static inline void inc(int& v, int i) {
        v += i;
    }
};

#endif

int main(int argc, char* argv[]) {
    int a = 42;

    for (int i = 0; i < argc; ++i)
        T::inc(a, i);

    std::cout << a;
}

The desired behavior would be the following: Whenever the NO_INC identifier is defined (using -DNO_INC when compiling), all calls to T::inc(...) should be optimized away (due to the empty function body). Otherwise, the call to T::inc(...) should trigger an increment by some given value i.

I got two questions regarding this:

  1. Is my assumption correct that calls to T::inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized?
  2. I wonder if the variables (a and i) are still loaded into the cache when T::inc(a, i) is called (assuming they are not there yet) although the function body is empty.

Thanks for any advice!


Solution

  • Compiler Explorer is an very useful tool to look at the assembly of your generated program, because there is no other way to figure out if the compiler optimized something or not for sure. Demo.

    With actually incrementing, your main looks like:

    main:                                   # @main
            push    rax
            test    edi, edi
            jle     .LBB0_1
            lea     eax, [rdi - 1]
            lea     ecx, [rdi - 2]
            imul    rcx, rax
            shr     rcx
            lea     esi, [rcx + rdi]
            add     esi, 41
            jmp     .LBB0_3
    .LBB0_1:
            mov     esi, 42
    .LBB0_3:
            mov     edi, offset std::cout
            call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
            xor     eax, eax
            pop     rcx
            ret
    

    As you can see, the compiler completely inlined the call to T::inc and does the incrementing directly.

    For an empty T::inc you get:

    main:                                   # @main
            push    rax
            mov     edi, offset std::cout
            mov     esi, 42
            call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
            xor     eax, eax
            pop     rcx
            ret
    

    The compiler optimized away the entire loop!

    Is my assumption correct that calls to t.inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized?

    Yes.

    If my assumption holds, does it also hold for more complex function bodies (in the #else branch)?

    No, for some definition of "complex". Compilers use heuristics to determine whether it's worth it to inline a function or not, and bases its decision on that and on nothing else.

    I wonder if the variables (a and i) are still loaded into the cache when t.inc(a, i) is called (assuming they are not there yet) although the function body is empty.

    No, as demonstrated above, the loop doesn't even exist.