Search code examples
language-agnosticcompiler-optimization

Can compilers optimize calculations within for-loops that don't depend on the for-loop variable?


Let's say I have a hot loop that looks like this (it's in C, but this question applies to compiled imperative languages in general):

int max_dist = some_value;

for (int x = min_x; x <= max_x; ++x)
{
    for (int y = min_y; y <= max_y; ++y)
    {
        if (x * x + y * y > max_dist * max_dist)
            continue;
        
        do_something(x, y);
    }
}

Would a compiler like Roslyn, MSVC, or GCC be able to optimize the loop to look something like this?

int max_dist = some_value;
int max_dist_sq = max_dist * max_dist;

for (int x = min_x; x <= max_x; ++x)
{
    int x_sq = x * x;
    for (int y = min_y; y <= max_y; ++y)
    {
        if (x_sq + y * y > max_dist_sq)
            continue;
        
        do_something(x, y);
    }
}

Or is that beyond the scope of what a compiler would normally attempt?


Solution

  • This is called loop-invariant code motion (LICM) and it is common. That said, it may not happen even when it looks like it's possible, for example:

    • The compiler has to know, for sure, that the relevant computation is invariant, and can be moved. That analysis can be harder for a compiler to figure out than it is for a human.
    • Lifting the computation out of the loop may not be profitable according to a heuristic cost assigned by the compiler. That decision may be correct or incorrect, it is only based on a heuristic after all, not on benchmarking both options.

    It is not a given that moving a computation out of a loop will help, since the computation may be (nearly-)trivial in cost, for different reasons: eg spare ALU capacity in a latency-limited loop, or a "free" operation being done as part of a complex addressing mode. At the same time, moving an operation out of a loop ties up a register with a live-range that spans the entire loop, which may complicate register allocation and even cost an extra spill, so it may not be free.

    Depending on the situation, other optimizations may be performed that make LICM unnecessary. For example if a value that was computed in a loop is never used, the computation would likely not be moved but removed entirely. Or if a value is constant, there would likely not be any associated computation. For example this code:

    extern void do_something(int x, int y);
    
    void test(int min_x, int max_x, int min_y, int max_y)
    {
        int max_dist = 9999;
    
        for (int x = min_x; x <= max_x; ++x)
        {
            for (int y = min_y; y <= max_y; ++y)
            {
                if (x * x + y * y > max_dist * max_dist)
                    continue;
                
                do_something(x, y);
            }
        }
    }
    

    Compiled with Clang 18.1.0 -O2, the inner loop looks like:

    .LBB0_4:                                #   Parent Loop BB0_2 Depth=1
            mov     eax, r12d
            imul    eax, r12d
            add     eax, ebx
            cmp     eax, 99980001
            ja      .LBB0_6
            mov     edi, r15d
            mov     esi, r12d
            call    do_something(int, int)@PLT
            jmp     .LBB0_6
    

    y is squared here, but x * x is computed outside of this loop, and max_dist * max_dist is a constant.

    As another example, if I change max_dist such that its square exceeds the range of an int (triggering UB in a form that is known at compile time) then Clang just nukes the whole thing, so there's not really LICM to speak of..