We know that by x86-64 convention, registers %rbx
, %rbp
, and %r12
–%r15
are classified as callee-saved registers. While %r10
and %r11
are caller-saved registers.
but when I compile C code in most of case, e.g. function P
calls Q
, I see following assembly code for function Q
:
Q:
push %rbx
movq %rdx, %rbx
...
popq %rbx
ret
We know that since %rbx
is a callee-saved register, we must store it on stack and restore it for the caller P
later.
but wouldn't it be more concise and save stack operations by using a caller saved register %r10
as:
Q:
movq %rdx, %r10
...
ret
so callee doesn't need to worry about save and restore the register for the caller, because the caller had already pushed it to stack before calling the callee?
You seem to be mixed up about what "caller-saved" means. I think this bad choice of terminology has fooled you into thinking that compilers actually will save them in the caller around function calls. That would be slower usually (Why do compilers insist on using a callee-saved register here?), especially in a function that makes more than one call, or calls in a loop.
Better terminology is call-clobbered vs. call-preserved, which reflects how compilers actually use them, and how humans should think about them: registers that die on a function call, or that don't. Compilers don't push/pop a call-clobbered (aka caller-saved) register around each call
.
But if you were going to push/pop a value around a single function call, you'd just do that with %rdx
. Copying it to R10 would just be a waste of instructions. So mov %r10
is useless. With a later push it's just inefficient, without it's incorrect.
The reason for copying to a call-preserved register is so the function arg will survive a function call that the function makes later. Obviously you have to use a call-preserved register for that; call-clobbered registers don't survive function calls.
When a call-preserved register isn't needed, yes compilers do pick call-clobbered registers.
If you expand your example to an actual MCVE instead of just showing the asm without source, this should be clearer. If you write a leaf function that needs a mov
to evaluate an expression, or a non-leaf that doesn't need any of its args after the first function-call, you won't see it wasting instructions saving and using a call-preserved reg. e.g.
int foo(int a) {
return (a>>2) + (a>>3) + (a>>4);
}
https://godbolt.org/z/ceM4dP with GCC and clang -O3:
# gcc10.2
foo(int):
mov eax, edi
mov edx, edi # using EDX, a call-clobbered register
sar edi, 4
sar eax, 2
sar edx, 3
add eax, edx
add eax, edi
ret
Right shift can't be done with LEA to copy-and-operate, and shifting the same input 3 different ways convinces GCC to use mov
to copy the input. (Instead of doing a chain of right-shifts: compilers love to minimize latency at the expense of more instructions because that's often best for wide OoO exec.)