Search code examples
c++assemblyg++reductionaccumulate

Why does g++ use movabs, and with a weird constant, for a simple reduction?


I'm compiling this simple program:

#include <numeric> 

int main()
{
    int numbers[] = {1, 2, 3, 4, 5};
    auto num_numbers = sizeof(numbers)/sizeof(numbers[0]);
    return std::accumulate(numbers,  numbers + num_numbers, 0);
}

which sums up the integers 1 through 5 and returns that sum (i.e. 15).

I realize std::accumulate can have a bit of trickery in the implementation, but still, this is pretty straightforward. I'm surprised by what I get when compiling this (on GodBolt), though.

With -O3, and with C++ being a compilation-time-computation-oriented language, I get the expected:

main:
        mov     eax, 15
        ret

but if I go down to -O2 - still some heavy optimization - not only do I not get this compile-time computation, but I see this strange piece of assembly:

main:
        movabs  rax, 8589934593
        lea     rdx, [rsp-40]
        mov     ecx, 1
        mov     DWORD PTR [rsp-24], 5
        mov     QWORD PTR [rsp-40], rax
        lea     rsi, [rdx+20]
        movabs  rax, 17179869187
        mov     QWORD PTR [rsp-32], rax
        xor     eax, eax
        jmp     .L3
.L5:
        mov     ecx, DWORD PTR [rdx]
.L3:
        add     rdx, 4
        add     eax, ecx
        cmp     rdx, rsi
        jne     .L5
        ret

Now .L5 and .L3 I get. The surprising thing are these strange movabs instructions, to and from rax. What do they mean and why are they there?

PS - I compiled using GCC 8.2 on an x86_64 with no -march set. If I add -march=skylake - the -O3 gets messed up too! Edit: This seems to be a regression in GCC, see my GCC bug report. Thanks @FlorianWeimer!


Solution

  • 8589934593 is 0x200000001 in hexadecimal, and 17179869187 is 0x400000003. These two movabs instructions simply load two int constants into one 64-bit register each, for initializing the array on the stack. You can disable this GCC optimization using -fno-store-merging, then you will get something like this at -O2 for the array initialization:

    movl    $1, -40(%rsp)
    …
    …
    movl    $2, -36(%rsp)
    …
    movl    $3, -32(%rsp)
    movl    $4, -28(%rsp)
    movl    $5, -24(%rsp)
    

    The lack of optimization to a single constant looks like a GCC regression, by the way. I do not see this with GCC 6.3. It could actually be related to store merging, which I do not think was part of GCC 6.