Search code examples
cassemblyx86-64buffer-overflowstack-memory

Difference in x86-32 and x64 Assembly stack allocation for a fixed-size buffer with unoptimized C (GCC)


Doing some basic disassembly and have noticed that the buffer is being given additional buffer space for some reason although what i am looking at in a tutorial uses the same code but is only given the correct (500) chars in length. Why is this?

My code:

#include <stdio.h>
#include <string.h>

int main (int argc, char** argv){
    char buffer[500];
    strcpy(buffer, argv[1]);
    return 0;
}

compiled with GCC, the dissembled code is:

   0x0000000000001139 <+0>:     push   %rbp
   0x000000000000113a <+1>:     mov    %rsp,%rbp
   0x000000000000113d <+4>:     sub    $0x210,%rsp
   0x0000000000001144 <+11>:    mov    %edi,-0x204(%rbp)
   0x000000000000114a <+17>:    mov    %rsi,-0x210(%rbp)
   0x0000000000001151 <+24>:    mov    -0x210(%rbp),%rax
   0x0000000000001158 <+31>:    add    $0x8,%rax
   0x000000000000115c <+35>:    mov    (%rax),%rdx
   0x000000000000115f <+38>:    lea    -0x200(%rbp),%rax
   0x0000000000001166 <+45>:    mov    %rdx,%rsi
   0x0000000000001169 <+48>:    mov    %rax,%rdi
   0x000000000000116c <+51>:    call   0x1030 <strcpy@plt>
   0x0000000000001171 <+56>:    mov    $0x0,%eax
   0x0000000000001176 <+61>:    leave  
   0x0000000000001177 <+62>:    ret  

However, this video https://www.youtube.com/watch?v=1S0aBV-Waeo clearly only has 500 bytes assigned

enter image description here

Why is this this the case as the only difference I can see here is one is 32-bit and another (mine) is on x86-64.


Solution

  • 500 is not a multiple of 16.

    The x86-64 ABI (application binary interface) requires the stack pointer to be a multiple of 16 whenever a call instruction is about to happen. (Since call pushes an 8-byte return address, this means the stack pointer is always congruent to 8, mod 16, when control reaches the first instruction of a called function.) For the code shown, it is convenient for the compiler to achieve this requirement by increasing the value it uses in the sub instruction, making it be a multiple of 16.

    The x86-32 ABI did not make this requirement, so there was no reason for the compiler used in the video to increase the size of the stack frame.

    Note that you appear to have compiled your code without optimization. I get this at -O2:

       0x0000000000000000 <+0>:     sub    $0x208,%rsp
       0x0000000000000007 <+7>:     mov    0x8(%rsi),%rsi
       0x000000000000000b <+11>:    mov    %rsp,%rdi
       0x000000000000000e <+14>:    call   <strcpy@PLT>
       0x0000000000000013 <+19>:    xor    %eax,%eax
       0x0000000000000015 <+21>:    add    $0x208,%rsp
       0x000000000000001c <+28>:    ret
    

    The stack adjustment is still somewhat larger than the size of the array, but not as big as what you had, and no longer a multiple of 16; the difference is that with optimization on, the frame pointer is eliminated, so %rbp does not need to be saved and restored, and so the stack pointer is not a multiple of 16 at the point of the sub instruction.

    (Incidentally, there is no requirement anywhere for a stack frame to be as small as possible. "Quality of implementation" dictates that it should be as small as possible, but for various reasons it's quite common for the compiler to miss that target. In my optimized code dump, I don't see any reason why the immediate operand to sub and add couldn't have been 0x1f8 (504).