Search code examples
assemblyx86-64cpu-registerscalling-convention

How to restore x86-64 register saving conventions


fibonacci:
         cmpq    $1, %rdi
         ja      .recursive
         movl    $1, %eax
         ret
.recursive:
         push    %rbp
         push    %r10
         movq    %rdi, %r10
         leaq    -2(%rdi), %rdi
         call    fibonacci
         movq    %rax, %rbp
         leaq    -1(%r10), %rdi
         call    fibonacci
         addq    %rbp, %rax
         pop     %r10
         pop     %rbp
         ret

How do I restore the x86-64 register saving conventions.

I'm not sure how to do this, but here's my train of thought. %r10 is a caller-saved register, so it needs to be saved before and after calls to other methods. In this situation, we need to save %r10 only if it is used after a recursive call to fibonacci. Still, I'm not sure where to add instructions to restore register saving conventions.


Solution

  • %r10 is a caller-saved register, so it needs to be saved before and after calls to other methods.

    Not if you avoid keeping anything precious in it! You're falling into the trap created by the clumsy "caller-saved" terminology, as if that means you have to save it instead of just letting tmp values die. I prefer call-preserved vs. call-clobbered

    Pick a call-preserved register like RBX for values that you want to survive across calls, and save/restore it once at the start/end of your function. Exactly like you're doing with RBP. You could do that with R10, since the only calls your function makes are to itself. But if you want to follow the x86-64 System V calling convention exactly (i.e. only assume that calling yourself preserves the call-preserved registers), use RBX, RBP, or R12..R15. What registers are preserved through a linux x86-64 function call

    As an optimization, you can make do with only 1 call-preserved register by using it for N at first, then for the first Fib result.

    Just for fun I made the input N a 32-bit integer; no need to use 64-bit integer inputs; Fib(94) overflows a 64-bit integer! This should make it easier to see when I'm dealing with N (32-bit registers) vs. Fib(N) (64-bit registers). Remember that writing a 32-bit register implicitly zero-extends into the full 64-bit register.

    ## Recursive is a terribly inefficient way to implement Fibonacci
    ## but if you must do it this way (without memoization or anything), this may be less bad than most, only using 16 bytes of stack per depth of call.
    recursive_fib:    # unsigned long fib(unsigned N)
        cmp    $1, %edi 
        ja    .Lnon_basecase
        mov    %edi, %eax                ## bugfix: Fib(0) = 0, Fib(1) = 1
        ret
    
    .Lnon_basecase:
        push   %rbx              ###### save a call-preserved register
        mov    %edi, %ebx        # keep a copy of N for later
        dec    %edi
        call   recursive_fib            # Fib(N-1)
        lea    -2(%rbx), %edi    # read N
        mov    %rax, %rbx        # replace RBX with Fib(N-1)
        call   recursive_fib            # Fib(N-2)
        add    %rbx, %rax        # retval = Fib(N-2) + Fib(N-1)
        pop    %rbx              ###### restore caller's RBX
        ret
    

    You could of course spill / reload RDI to the stack across the first call, either like a compiler would with a mov into space you reserved earlier with sub $16, %rsp, or with actual push/pop. You'd need to do it this way if you ran out of call-preserved registers in a large function.

    (But hopefully you can choose to spill something that's read-mostly: re-reading memory is cheap, updating memory creates store/reload latency bottlenecks. This is why compilers have smart register-allocation algorithms)

    (You only need 8 bytes of space, but a compiler would maintain x86-64 System V's 16-byte RSP alignment before a call. That means the total RSP offset in any function before another call is 8 + 16*x, and if you start with a push %rbp then you need to leave 16 bytes. By hand you know you're only calling yourself, and that this function doesn't care about stack alignment.)


    BTW, recursive Fibonacci takes O(Fib(N)) time vs. O(N) for a simpler loop that uses the last 2 values. Fun fact, xadd %rdx, %rax in a loop implements Fibonacci, or unroll with 2 add instructions. Assembly Language (x86): How to create a loop to calculate Fibonacci sequence