Search code examples
macosgccassemblyx86abi

Activation records - C


Please consider the below program:

#include <stdio.h>

void my_f(int);

int main()
{
    int i = 15;
    my_f(i);
}

void my_f(int i)
{
    int j[2] = {99, 100};
    printf("%d\n", j[-2]);
}

My understanding is that the activation record (aka stack frame) for my_f() should look like this:

    ------------
    |     i    |    15
    ------------
    | Saved PC |    Address of next instruction in caller function
    ------------
    |   j[0]   |    99
    ------------
    |   j[1]   |    100
    ------------

I expected j[-2] to print 15, but it prints 0. Could someone please explain what I am missing here? I am using GCC 4.0.1 on OS X 10.5.8 (Yes, I live under a rock, but that's besides the point here).


Solution

  • If you ever actually want the address of your stack frame in GNU C, use
    __builtin_frame_address(0)
    (non-zero args attempt to backtrace up the stack to parent stack frames). This is the address of the first thing pushed by the function, i.e. a saved ebp or rbp if you compiled with -fno-omit-frame-pointer. If you want to modify the return address on the stack, you might be able to do that with an offset from __builtin_frame_address(0), but to just read it reliably use __builtin_return_address(0).


    GCC keeps the stack 16byte-aligned in the usual x86 ABIs. There could easily be a gap between the return address and j[1]. In theory, it could put j[] as far down as it wanted, or optimize it away (or to a read-only static constant, since nothing writes it).

    If you compiled with optimization, i probably isn't stored anywhere, and my_f(int i) is inlined into main.

    Also, like @EOF said, j[-2] is two spots below the bottom of your diagram. (Low addresses are at the bottom, because the stack grows down). Also note that the diagram on wikipedia (from the link I edited into the question) is drawn with low addresses at the top. The ASCII diagram in my answer has low addresses at the bottom.

    If you compiled with -O0, then there's some hope. In 64bit code (the default target for 64bit builds of gcc and clang), the calling convention passes the first 6 args in registers, so the only i in memory will be in main's stack frame.

    Also, in AMD64 code, j[3] might be the upper half of the return address (or the saved %rbp), if j[] is placed below one of those with no gap. (pointers are 64bit, int is still 32 bits.) j[2], the first out-of-bounds element, would alias onto the low 32bits (aka low dword in Intel terminology, where a "word" is 16 bits.)


    The best hope for this to work is in un-optimized 32bit code,

    using a calling convention with no register-args. (e.g. the x86 32bit SysV ABI. See also the tag wiki).

    In that case, your stack will look like:

    # 32bit stack-args calling convention, unoptimized code
    
      higher addresses
    ^^^^^^^^^^^^
    | argv     |
    ------------
    | argc     |
    -------------------
    | main's ret addr |
    -------------------
    |   ...    |
    |  main()'s local variables and stuff, layout decided by the compiler
    |   ...    |
    ------------
    |     i    |    # function arg
    ------------ <--   16B-aligned boundary for the first arg, as required in the ABI
    | ret addr |
    ------------ <--- esp pointer on entry to the function
    |saved ebp |  # because gcc -m32 -O0 uses -fno-omit-frame-pointer
    ------------ <--- ebp after  mov ebp, esp  (part of no-omit-frame-pointer)
      unpredictable amount of padding, up to the compiler.  (likely 0 bytes in this case)
      but actually not: clang 3.5 for example makes a copy of it's arg (`i`) here, and puts j[] right below that, so j[2] or j[5] will work
    ------------
    |  j[1]    |
    ------------
    |  j[0]    |
    ------------
    |          |
    vvvvvvvvvvvv   Lower addresses.  (The wikipedia diagram is upside-down, IMO: it has low addresses at the top).
    

    It's somewhat likely that the 8 byte j array will be placed right below the value written by push ebp, with no gap. That would make j[0] 16B-aligned, although there's no requirement or guarantee that local arrays have any particular alignment. (Except that C99 variable-length arrays are 16B-aligned, in the AMD64 SysV ABI. I don't remember there being a guarantee for non-variable length arrays, but I didn't check.)

    If the function saved any other call-preserved registers (like ebx) so it could use them, those saved registers would be before or after the saved ebp, above space used for locals.

    j[4] might work in 32bit code, like @EOF suggested. I assume he arrived at 4 by the same reasoning I did, but forgot to mention that it only applies to 32bit code.


    Looking at the asm:

    Of course, at what really happens is much better than all this guessing and hand-waving.

    I put your function on the Godbolt compiler explorer, with the oldest gcc version it has (4.4.7), using -xc -O0 -Wall -fverbose-asm -m32. -xc is to compile as C, not C++.

    my_f:
        push    ebp     #
        mov     ebp, esp  #,
        sub     esp, 40   #,              # no idea why it reserves 40 bytes.  clang 3.5 only reserves 24
        mov     DWORD PTR [ebp-16], 99    # j[0]
        mov     DWORD PTR [ebp-12], 100   # j[1]
        mov     edx, DWORD PTR [ebp+0]    ######   This is the j[4] load
        mov     eax, OFFSET FLAT:.LC0     # put the format string address into eax
        mov     DWORD PTR [esp+4], edx    # store j[4] on the stack, to become an arg for printf
        mov     DWORD PTR [esp], eax      # store the format string
        call    printf  #
        leave
        ret
    

    So gcc puts j at ebp-16, not the ebp-8 that I guessed. j[4] gets the saved ebp. i is at j[6], 8 more bytes up the stack.

    Remember, all we've learned here is what gcc 4.4 happens to do at -O0. There's no rule that says j[6] will refer to a location that holds a copy of i on any other setup, or with different surrounding code.

    If you want to learn asm from compiler output, look at the asm from -Og or -O1 at least. -O0 stores everything to memory after every statement, so it's very noisy / bloated, which makes it harder to follow. Depending on what you want to learn, -O3 is good. Obviously you have to write functions that do something with input parameters instead of compile-time constants, so they don't optimize away. See How to remove "noise" from GCC/clang assembly output? (especially the link to Matt Godbolt's CppCon2017 talk), and other links in the tag wiki.


    clang 3.5.

    As noted in the diagram, copies i from the arg slot to a local. Although when it calls printf, it copies from the arg slot again, not the copy inside its own stack frame.