I was using Compiler Explorer and noticed GCC and Clang would emit seemingly unnecessary instructions related to the stack when compiling this simple function (Compiler Explorer).
void bar(void);
int foo(void) {
bar();
return 42;
}
Here is the result of the compilation (also visible in Compiler Explorer via the link above). -mabi=sysv
has no effect on the output assembly, but I wanted to rule out the ABI as the cause of the strange assembly.
// Expected output:
foo:
call bar
mov eax, 42
ret
// gcc -O3 -mabi=sysv
// Why is it reserving unused space in the stack frame?
foo:
sub rsp, 8
call bar
mov eax, 42
add rsp, 8
ret
// clang -O3 -mabi=sysv
// Why is it preserving a scratch register then moving it to another unused scratch register?
foo:
push rax
call bar@PLT
mov eax, 42
pop rcx
ret
I found this particularly strange since this seems like a particularly easy optimization for major compilers like GCC and Clang to perform when working with a known ABI.
I have a couple theories, but I was hoping to get some clarification.
bar
calls foo
recursively? By consuming a small amount of stack space on each call we ensure that the program eventually segfaults when it runs out of stack space. Maybe clang is doing the same thing, but it uses push
and pop
to allow for better pipelining in some situations? If this is the case, are there any CLI arguments I can use to disable this behavior? However, this seems like a non-issue since call
pushes rip
to the stack anyway on x86-64.Alignment.
The call
instruction pushes 8 bytes onto the stack (the return address). So the optimized functions adjust by another 8 bytes to ensure the stack pointer is 16-byte aligned.
I believe this is a requirement of the ABI to ensure that 128-bit SSE register values can be spilled to naturally-aligned addresses, which is important to avoid a performance hit or fault, depending on CPU configuration. And/or so that SSE instructions can be used for optimized block moves from appropriate addresses.
The clang and gcc case are effectively identical - you don't really care what was written to that stack slot, or which volatile register was updated, only that the stack pointer was adjusted.