Search code examples
assemblygccoptimizationclangx86-64

Strange unnecessary stack usage by GCC and Clang


I was using Compiler Explorer and noticed GCC and Clang would emit seemingly unnecessary instructions related to the stack when compiling this simple function (Compiler Explorer).

void bar(void);

int foo(void) {
    bar();
    return 42;
}

Here is the result of the compilation (also visible in Compiler Explorer via the link above). -mabi=sysv has no effect on the output assembly, but I wanted to rule out the ABI as the cause of the strange assembly.

// Expected output:
foo:
        call    bar
        mov     eax, 42
        ret

// gcc -O3 -mabi=sysv
// Why is it reserving unused space in the stack frame?
foo:
        sub     rsp, 8
        call    bar
        mov     eax, 42
        add     rsp, 8
        ret

// clang -O3 -mabi=sysv
// Why is it preserving a scratch register then moving it to another unused scratch register?
foo:
        push    rax
        call    bar@PLT
        mov     eax, 42
        pop     rcx
        ret

Why is the stack frame modified despite the function not using stack?

I found this particularly strange since this seems like a particularly easy optimization for major compilers like GCC and Clang to perform when working with a known ABI.

I have a couple theories, but I was hoping to get some clarification.

  • Maybe this is done to prevent an infinite loop in the event that bar calls foo recursively? By consuming a small amount of stack space on each call we ensure that the program eventually segfaults when it runs out of stack space. Maybe clang is doing the same thing, but it uses push and pop to allow for better pipelining in some situations? If this is the case, are there any CLI arguments I can use to disable this behavior? However, this seems like a non-issue since call pushes rip to the stack anyway on x86-64.
  • Maybe there is some quirk of C or the AMD64 System V ABI that I am unaware of?
  • Perhaps I was overthinking this and the strange assembly is simply the result of poor register/stack optimization. Maybe at some point in the compilation process the stack was used, but after the usages were optimized away it was unable to remove the value on the stack.

Solution

  • Alignment.

    The call instruction pushes 8 bytes onto the stack (the return address). So the optimized functions adjust by another 8 bytes to ensure the stack pointer is 16-byte aligned.

    I believe this is a requirement of the ABI to ensure that 128-bit SSE register values can be spilled to naturally-aligned addresses, which is important to avoid a performance hit or fault, depending on CPU configuration. And/or so that SSE instructions can be used for optimized block moves from appropriate addresses.

    The clang and gcc case are effectively identical - you don't really care what was written to that stack slot, or which volatile register was updated, only that the stack pointer was adjusted.