Search code examples
gccstackalignmentprogram-entry-point

How can I tell GCC not to align main's stack to 16-byte boundary?


GCC is doing some voodoo where it is aligning my main's stack, saving the location of the arguments into ecx

0x08049060      8d4c2404       lea ecx, [arg_4h]                       ; 4 ; [13] -r-x section size 465 named .text
0x08049064      83e4f0         and esp, 0xfffffff0
0x08049067      ff71fc         push dword [ecx - 4]
0x0804906a      55             push ebp
0x0804906b      89e5           mov ebp, esp
0x0804906d      51             push ecx

And, then later,

0x080490a4      8b4dfc         mov ecx, dword [local_4h]
0x080490a7      83c410         add esp, 0x10
0x080490aa      c9             leave
0x080490ab      8d61fc         lea esp, [ecx - 4]
0x080490ae      c3             ret

I believe I understand why GCC is doing what it's doing (you can read about it here), but the binary that I'm trying to rebuild from source lacks these instructions in the tutorial, and I want to generate assembly as close to the tutorial as possible.

Both of these are return from file,

stack0: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.6.18,


I've tried __attribute__ ((packed)) on main, and #pragma pack neither of which worked.


Solution

  • You can disable the stack alignment, by telling GCC to align to 2^2 = 4 bytes, rather than 16

    -mpreferred-stack-boundary=2
    

    There will likely be some performance implications.


    Caveat from Peter Cordes,

    This will violate the ABI for the whole rest of the program, not maintaining the 16-byte alignment that the Linux version of the i386 SysV ABI guarantees on program entry and before calls to other functions. So compiler-generated code using SSE instructions may segfault. It may be possible with a function attribute on main to tell it that the incoming stack alignment is correct. This might not be a problem for your use-case, but it's an important caveat. Also it can break _Atomic uint64_t local vars that other threads get references to.