Search code examples
cinlinecompiler-optimizationram

Is it worth inlining a function containing one tight loop?


Suppose we have a function that contains only one tight loop, e.g.:

int Prime(int n) {
    for (int i = 2; i * i <= n; i++) {
        if (n % i == 0) {
            return 0;
        }
    }
    return 1;
}

Is it worth inlining this function: inline int Prime(int n)? Note, for the purpose of this question, assume code replacement will happen if we inline the function. So now the question is, whether that's a good thing to do in this case.

I would say no, because if the CPU executes a call instruction and starts executing this function with everything else being pushed into the stack, then the integers i and n might fit into the registers and the whole for loop would execute only on registers.

However, if we inline the function and the code gets inserted as a part of some bigger block, then i and n might end up on the stack and the for loop would be accessing RAM to continuously query their values.

Is this a reason to not use the keyword inline for such function?

Are there other reasons why we should not inline such function?


Solution

  • However, if we inline the function and the code gets inserted as a part of some bigger block, then i and n might end up on the stack and the for loop would be accessing RAM to continuously query their values.

    Not really, no sane compiler when ran with proper optimization flags will put loop variables in RAM. At best, even if the arguments are passed through the stack, then you will see one or more moves into registers before the start of the loop (with appropriate push/pop instructions before and after the loop to preserve the original values of the registers).

    For example:

    push regA   ; save original regA
    mov regA, 2 ; use regA as i
    ; ... loop ...
    pop regA    ; restore regA
    

    About the inline keyword: it's purely an hint. Modern compilers will not respect it. If you want to force inlining, then you can use compiler-specific flags, like for example for GCC or Clang:

    inline __attribute__((always_inline)) int Prime(int n) { ...
    

    Would it be worth to inline such function? It's not possible to say beforehand. One can only guess, but even then, guessing would require seeing the code around the actual function call. In general you can only say if inlining is useful after testing and profiling the code with the inlined version VS the code with the non-inlined version.

    Are there other reasons why we should not inline such function?

    As @P__J__ notices in the comments: inlining increases the memory footprint of the program if inlined function is called in many places. Because of that many compilers have a limit of the size of all and/or this particular inlined function. Usually inlining increases the size of the program code.

    If your goal is to have a very small program, then inlining is not a good idea.