Search code examples
assemblyx86-64stack-memoryabired-zone

Why do we need stack allocation when we have a red zone?


I have the following doubts:

As we know System V x86-64 ABI gives us about a fixed-size area (128 bytes) in the stack frame, so called redzone. So, as a result we don't need to use, for example, sub rsp, 12. Just make mov [rsp-12], X and that's all.

But I cannot grasp idea of that. Why does it matter? Is it necessary to sub rsp, 12 without redzone? After all, stack size is limited at the beginning so why sub rsp, 12 is important? I know that it makes possible us to follow the top of the stack but let's ignore it at that moment.

I know what some instructions use rsp value ( like ret) but don't care about it in that moment.

The crux of the problem is: We have no redzone and we've done:

function:
    mov [rsp-16], rcx
    mov [rsp-32], rcx
    mov [rsp-128], rcx
    mov [rsp-1024], rcx
    ret

Is it difference with?

function:
    sub rsp, 1024
    mov [rsp-16], rcx
    mov [rsp-32], rcx
    mov [rsp-128], rcx
    mov [rsp-1024], rcx
    add rsp, 1024
    ret

Solution

  • The "red zone" is not strictly necessary. In your terms, it could be considered "pointless." Everything that you could do using the red zone, you could also do the traditional way that you did it targeting the IA-32 ABI.

    Here's what the AMD64 ABI says about the "red zone":

    The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

    The real purpose of the red zone is as an optimization. Its existence allows code to assume that the 128 bytes below rsp will not be asynchronously clobbered by signals or interrupt handlers, which makes it possible to use it as scratch space. This makes it unnecessary to explicitly create scratch space on the stack by moving the stack pointer in rsp. This is an optimization because the instructions to decrement and restore rsp can now be elided, saving time and space.

    So yes, while you could do this with AMD64 (and would need to do it with IA-32):

    function:
        push rbp                      ; standard "prologue" to save the
        mov  rbp, rsp                 ;   original value of rsp
    
        sub  rsp, 32                  ; reserve scratch area on stack
        mov  QWORD PTR [rsp],   rcx   ; copy rcx into our scratch area
        mov  QWORD PTR [rsp+8], rdx   ; copy rdx into our scratch area
    
        ; ...do something that clobbers rcx and rdx...
    
        mov  rcx, [rsp]               ; retrieve original value of rcx from our scratch area
        mov  rdx, [rsp+8]             ; retrieve original value of rdx from our scratch area
        add  rsp, 32                  ; give back the stack space we used as scratch area
    
        pop  rbp                      ; standard "epilogue" to restore rsp
        ret
    

    we don't need to do it in cases where we only need a 128-byte scratch area (or smaller), because then we can use the red zone as our scratch area.

    Plus, since we no longer have to decrement the stack pointer, we can use rsp as the base pointer (instead of rbp), making it unnecessary to save and restore rbp (in the prologue and epilogue), and also freeing up rbp for use as another general-purpose register!

    (Technically, turning on frame-pointer omission (-fomit-frame-pointer, enabled by default with -O1 since the ABI allows it) would also make it possible for the compiler to elide the prologue and epilogue sections, with the same benefits. However, absent a red zone, the need to adjust the stack pointer to reserve space would not change.)

    Note, however, that the ABI only guarantees that asynchronous things like signals and interrupt handlers not modify the red zone. Calls to other functions may clobber values in the red zone, so it is not particularly useful except in leaf functions (which those functions that do not call any other functions, as if they were at the "leaf" of a function-call tree).


    A final point: the Windows x64 ABI deviates slightly from the AMD64 ABI used on other operating systems. In particular, it has no concept of a "red zone". The area beyond rsp is considered volatile and subject to be overwritten at any time. Instead, it requires that the caller allocate a home address space on the stack, which is then available for the callee's use in the event that it needs to spill any of the register-passed parameters.