Consider the following example code:
void black_box(int* foo);
int foo(int x) {
black_box(&x);
return x;
}
int bar(const int x) {
black_box(const_cast<int*>(&x));
return x;
}
Quoting cppreference:
A const object is
- an object whose type is const-qualified, or
- [...]
Such object cannot be modified: attempt to do so [..] indirectly (e.g., by modifying the const object through a reference or pointer to non-const type) results in undefined behavior.
The x
in bar
is declared const. The compiler should be allowed to assume that black_box
will not change it. Despite that, the generated x86 assembly for foo
and bar
is the same, for both gcc and clang: https://godbolt.org/z/jGsKo56aG
push rax
mov dword ptr [rsp + 4], edi
lea rdi, [rsp + 4]
call black_box(int*)@PLT
mov eax, dword ptr [rsp + 4]
pop rcx
ret
For bar
, why does the compiler not move x into a callee saved register, and save the load from memory after the function call?
For bar, why does the compiler not move x into a callee saved register, and save the load from memory after the function call?
As it so often happens, some ideas sound good until you actually try them. Thanks to @Brian Bi for prompting me to do just that. As it turns out, compilers are usually pretty good at their job.
If we actually tried what I suggested, we end up with the following assembly:
push rax
mov dword ptr [rsp + 4], edi
lea rdi, [rsp + 4]
mov ebx, eax ; move `x`(eax) into callee saved register (ebx)
call black_box(int*)@PLT
mov eax, ebx ; restore `x` into return value register (eax)
pop rcx
ret
While this avoids the load from stack, it uses an additional instruction, because the return value has to end up in eax, which gets clobbered by the call. The compiler's version can avoid any additional backup, because they just get the value from the stack. At this point, it is unclear to me whether this would be faster or slower in practice.
The optimal assembly is probably:
push rdi
mov rdi, rsp
call black_box(int*)@PLT
pop rax
ret
But compilers have never really been good at these kinds of stack frame layout optimizations.