Let's say I have a hot loop that looks like this (it's in C, but this question applies to compiled imperative languages in general):
int max_dist = some_value;
for (int x = min_x; x <= max_x; ++x)
{
for (int y = min_y; y <= max_y; ++y)
{
if (x * x + y * y > max_dist * max_dist)
continue;
do_something(x, y);
}
}
Would a compiler like Roslyn, MSVC, or GCC be able to optimize the loop to look something like this?
int max_dist = some_value;
int max_dist_sq = max_dist * max_dist;
for (int x = min_x; x <= max_x; ++x)
{
int x_sq = x * x;
for (int y = min_y; y <= max_y; ++y)
{
if (x_sq + y * y > max_dist_sq)
continue;
do_something(x, y);
}
}
Or is that beyond the scope of what a compiler would normally attempt?
This is called loop-invariant code motion (LICM) and it is common. That said, it may not happen even when it looks like it's possible, for example:
It is not a given that moving a computation out of a loop will help, since the computation may be (nearly-)trivial in cost, for different reasons: eg spare ALU capacity in a latency-limited loop, or a "free" operation being done as part of a complex addressing mode. At the same time, moving an operation out of a loop ties up a register with a live-range that spans the entire loop, which may complicate register allocation and even cost an extra spill, so it may not be free.
Depending on the situation, other optimizations may be performed that make LICM unnecessary. For example if a value that was computed in a loop is never used, the computation would likely not be moved but removed entirely. Or if a value is constant, there would likely not be any associated computation. For example this code:
extern void do_something(int x, int y);
void test(int min_x, int max_x, int min_y, int max_y)
{
int max_dist = 9999;
for (int x = min_x; x <= max_x; ++x)
{
for (int y = min_y; y <= max_y; ++y)
{
if (x * x + y * y > max_dist * max_dist)
continue;
do_something(x, y);
}
}
}
Compiled with Clang 18.1.0 -O2
, the inner loop looks like:
.LBB0_4: # Parent Loop BB0_2 Depth=1
mov eax, r12d
imul eax, r12d
add eax, ebx
cmp eax, 99980001
ja .LBB0_6
mov edi, r15d
mov esi, r12d
call do_something(int, int)@PLT
jmp .LBB0_6
y
is squared here, but x * x
is computed outside of this loop, and max_dist * max_dist
is a constant.
As another example, if I change max_dist
such that its square exceeds the range of an int
(triggering UB in a form that is known at compile time) then Clang just nukes the whole thing, so there's not really LICM to speak of..