c assembly x86-64 compiler-optimization calling-convention

why callees don't use caller saved registers first?

We know that by x86-64 convention, registers %rbx, %rbp, and %r12–%r15 are classified as callee-saved registers. While %r10 and %r11 are caller-saved registers. but when I compile C code in most of case, e.g. function P calls Q, I see following assembly code for function Q:

Q:
   push %rbx
   movq %rdx, %rbx
   ...
   popq %rbx
   ret

We know that since %rbx is a callee-saved register, we must store it on stack and restore it for the caller P later.

but wouldn't it be more concise and save stack operations by using a caller saved register %r10 as:

Q:
   movq %rdx, %r10
   ...
   ret

so callee doesn't need to worry about save and restore the register for the caller, because the caller had already pushed it to stack before calling the callee?

Solution

You seem to be mixed up about what "caller-saved" means. I think this bad choice of terminology has fooled you into thinking that compilers actually will save them in the caller around function calls. That would be slower usually (Why do compilers insist on using a callee-saved register here?), especially in a function that makes more than one call, or calls in a loop.

Better terminology is call-clobbered vs. call-preserved, which reflects how compilers actually use them, and how humans should think about them: registers that die on a function call, or that don't. Compilers don't push/pop a call-clobbered (aka caller-saved) register around each call.

But if you were going to push/pop a value around a single function call, you'd just do that with %rdx. Copying it to R10 would just be a waste of instructions. So mov %r10 is useless. With a later push it's just inefficient, without it's incorrect.

The reason for copying to a call-preserved register is so the function arg will survive a function call that the function makes later. Obviously you have to use a call-preserved register for that; call-clobbered registers don't survive function calls.

When a call-preserved register isn't needed, yes compilers do pick call-clobbered registers.

If you expand your example to an actual MCVE instead of just showing the asm without source, this should be clearer. If you write a leaf function that needs a mov to evaluate an expression, or a non-leaf that doesn't need any of its args after the first function-call, you won't see it wasting instructions saving and using a call-preserved reg. e.g.

int foo(int a) {
    return (a>>2) + (a>>3) + (a>>4);
}

https://godbolt.org/z/ceM4dP with GCC and clang -O3:

# gcc10.2
foo(int):
        mov     eax, edi
        mov     edx, edi      # using EDX, a call-clobbered register
        sar     edi, 4
        sar     eax, 2
        sar     edx, 3
        add     eax, edx
        add     eax, edi
        ret

Right shift can't be done with LEA to copy-and-operate, and shifting the same input 3 different ways convinces GCC to use mov to copy the input. (Instead of doing a chain of right-shifts: compilers love to minimize latency at the expense of more instructions because that's often best for wide OoO exec.)