Search code examples
cgccassemblyx86-64stack-frame

x86_64 : is stack frame pointer almost useless?



  • Linux x86_64.
  • gcc 5.x

I was studying the output of two codes, with -fomit-frame-pointer and without (gcc at "-O3" enables that option by default).

pushq    %rbp
movq     %rsp, %rbp
...
popq     %rbp

My question is :

If I globally disable that option, even for, at the extreme, compiling an operating system, is there a catch ?

I know that interrupts use that information, so is that option good only for user space ?


Solution

  • The compilers always generate self consistent code, so disabling the frame pointer is fine as long as you don't use external/hand crafted code that makes some assumption about it (e.g. by relying on the value of rbp for example).

    The interrupts don't use the frame pointer information, they may use the current stack pointer for saving a minimal context but this is dependent on the type of interrupt and OS (an hardware interrupt uses a Ring 0 stack probably).
    You can look at Intel manuals for more information on this.

    About the usefulness of the frame pointer:
    Years ago, after compiling a couple of simple routines and looking at the generated 64 bit assembly code I had your same question.
    If you don't mind reading a whole lot of notes I have written for myself back then, here they are.

    Note: Asking about the usefulness of something is a little bit relative. Writing assembly code for the current main 64 bit ABIs I found my self using the stack frame less and less. However this is just my coding style and opinion.


    I like using the frame pointer, writing the prologue and epilogue of a function, but I like direct uncomfortable answers too, so here's how I see it:

    Yes, the frame pointer is almost useless in x86_64.

    Beware it is not completely useless, especially for humans, but a compiler doesn't need it anymore. To better understand why we have a frame pointer in the first place it is better to recall some history.

    Back in the real mode (16 bit) days

    When Intel CPUs supported only "16 bit mode" there were some limitation on how to access the stack, particularly this instruction was (and still is) illegal

    mov ax, WORD [sp+10h]
    

    because sp cannot be used as a base register. Only a few designated registers could be used for such purpose, for example bx or the more famous bp.
    Nowadays it's not a detail everybody put their eyes on but bp has the advantage over other base register that by default it implicitly implicates the use of ss as a segment/selector register, just like implicit usages of sp (by push, pop, etc), and like esp does on later 32-bit processors.
    Even if your program was scattered all across memory with each segment register pointing to a different area, bp and sp acted the same, after all that was the intent of the designers.

    So a stack frame was usually necessary and consequently a frame pointer.
    bp effectively partitioned the stack in four parts: the arguments area, the return address, the old bp (just a WORD) and the local variables area. Each area being identified by the offset used to access it: positive for the arguments and return address, zero for the old bp, negative for the local variables.

    Extended effective addresses

    As the Intel CPUs were evolving, the more extensive 32-bit addressing modes were added.
    Specifically the possibility to use any 32-bit general-purpose register as a base register, this includes the use of esp.
    Being instructions like this

    mov eax, DWORD [esp+10h]
    

    now valid, the use of the stack frame and the frame pointer seems doomed to an end.
    Likely this was not the case, at least in the beginnings.
    It is true that now it is possible to use entirely esp but the separation of the stack in the mentioned four areas is still useful, especially for humans.

    Without the frame pointer a push or a pop would change an argument or local variable offset relative to esp, giving form to code that look non intuitive at first sight. Consider how to implement the following C routine with cdecl calling convention:

    void my_routine(int a, int b)
    {  
        return my_add(a, b); 
    }
    

    without and with a framestack

    my_routine:      
      push DWORD [esp+08h]
      push DWORD [esp+08h]
      call my_add
      ret
    
    my_routine:
      push ebp
      mov ebp, esp
    
      push DWORD [ebp+0Ch]
      push DWORD [ebp+08h]
      call my_add
      
      pop ebp
      ret 
    

    At first sight it seems that the first version pushes the same value twice. It actually pushes the two separate arguments however, as the first push lowers esp so the same effective address calculation points the second push to a different argument.

    If you add local variables (especially lots of them) then the situation quickly becomes hard to read: Does mov eax, [esp+0CAh] refer to a local variable or to an argument? With a stack frame we have fixed offsets for the arguments and local variables.

    Even the compilers at first still preferred the fixed offsets given by the use of the frame base pointer. I see this behavior changing first with gcc.
    In a debug build the stack frame effectively adds clarity to the code and makes it easy for the (proficient) programmer to follow what is going on and, as pointed out in the comment, lets them recover the stack frame more easily.
    The modern compilers however are good at math and can easily keep count of the stack pointer movements and generate the appropriate offsets from esp, omitting the stack frame for faster execution.

    When a CISC requires data alignment

    Until the introduction of SSE instructions the Intel processors never asked much from the programmers compared to their RISC brothers.
    In particular they never asked for data alignment, we could access 32 bit data on an address not a multiple of 4 with no major complaint (depending on the DRAM data width, this may result on increased latency).
    SSE used 16 bytes operands that needed to be accessed on 16 byte boundary, as the SIMD paradigm becomes implemented efficiently in the hardware and becomes more popular the alignment on 16 byte boundary becomes important.

    The main 64 bit ABIs now require it, the stack must be aligned on paragraphs (ie, 16 bytes).
    Now, we are usually called such that after the prologue the stack is aligned, but suppose we are not blessed with that guarantee, we would need to do one of this

    push rbp                   push rbp
    mov rbp, rsp               mov rbp, rsp             
    
    and spl, 0f0h              sub rsp, xxx
    sub rsp, 10h*k             and spl, 0f0h
    

    One way or another the stack is aligned after these prologues, however we can no longer use a negative offset from rbp to access local vars that need alignment, because the frame pointer itself is not aligned.
    We need to use rsp, we could arrange a prologue that has rbp pointing at the top of an aligned area of local vars but then the arguments would be at unknown offsets.
    We can arrange a complex stack frame (maybe with more than one pointer) but the key of the old fashioned frame base pointer was its simplicity.

    So we can use the frame pointer to access the arguments on the stack and the stack pointer for the local variables, fair enough.
    Alas the role of stack for arguments passing has been reduced and for a small number of arguments (currently four) it is not even used and in the future it will probably be used even less.

    So we don't use the frame pointer for local variables (mostly), nor for the arguments (mostly), for what do we use it?

    1. It saves a copy of the original rsp, so to restore the stack pointer at function exit, a mov is enough. If the stack is aligned with an and, which is not invertible, an original copy is necessary.

    2. Actually some ABIs guarantee that after the standard prologue the stack is aligned thereby allowing us to use the frame pointer as usual.

    3. Some variables don't need alignment and can be accessed with an unaligned frame pointer, this is usually true for hand crafted code.

    4. Some functions require more than four parameters.

    Summary

    The frame pointer is a vestigial paradigm from 16 bit programs that has proven itself still useful on 32 bit machines because of its simplicity and clarity when accessing local variables and arguments.
    On 64 bit machines however the strict requirements vanish most of the simplicity and clarity, the frame pointer remains used in debug mode however.


    On the fact that the frame pointer can be used to make fun things: it is true I guess, I've never seen such code but I can image how it would work.
    I, however, focused on the housekeeping role of the frame pointer as this is the way I always have seen it.
    All the crazy things can be done with any pointer set to the same value of the frame pointer, I give the latter a more "special" role.
    VS2013 for example sometimes uses rdi as a "frame pointer", but I don't consider it a real frame pointer if it doesn't use rbp/ebp/bp.
    To me the use of rdi means a Frame Pointer Omission optimization :)