Please consider the below program:
#include <stdio.h>
void my_f(int);
int main()
{
int i = 15;
my_f(i);
}
void my_f(int i)
{
int j[2] = {99, 100};
printf("%d\n", j[-2]);
}
My understanding is that the activation record (aka stack frame) for my_f()
should look like this:
------------
| i | 15
------------
| Saved PC | Address of next instruction in caller function
------------
| j[0] | 99
------------
| j[1] | 100
------------
I expected j[-2] to print 15, but it prints 0. Could someone please explain what I am missing here? I am using GCC 4.0.1 on OS X 10.5.8 (Yes, I live under a rock, but that's besides the point here).
If you ever actually want the address of your stack frame in GNU C, use
__builtin_frame_address(0)
(non-zero args attempt to backtrace up the stack to parent stack frames). This is the address of the first thing pushed by the function, i.e. a saved ebp
or rbp
if you compiled with -fno-omit-frame-pointer
. If you want to modify the return address on the stack, you might be able to do that with an offset from __builtin_frame_address(0)
, but to just read it reliably use __builtin_return_address(0)
.
GCC keeps the stack 16byte-aligned in the usual x86 ABIs. There could easily be a gap between the return address and j[1]
. In theory, it could put j[]
as far down as it wanted, or optimize it away (or to a read-only static constant, since nothing writes it).
If you compiled with optimization, i
probably isn't stored anywhere, and
my_f(int i)
is inlined into main
.
Also, like @EOF said, j[-2]
is two spots below the bottom of your diagram. (Low addresses are at the bottom, because the stack grows down). Also note that the diagram on wikipedia (from the link I edited into the question) is drawn with low addresses at the top. The ASCII diagram in my answer has low addresses at the bottom.
If you compiled with -O0
, then there's some hope. In 64bit code (the default target for 64bit builds of gcc and clang), the calling convention passes the first 6 args in registers, so the only i
in memory will be in main
's stack frame.
Also, in AMD64 code, j[3]
might be the upper half of the return address (or the saved %rbp), if j[]
is placed below one of those with no gap. (pointers are 64bit, int
is still 32 bits.) j[2]
, the first out-of-bounds element, would alias onto the low 32bits (aka low dword in Intel terminology, where a "word" is 16 bits.)
using a calling convention with no register-args. (e.g. the x86 32bit SysV ABI. See also the x86 tag wiki).
In that case, your stack will look like:
# 32bit stack-args calling convention, unoptimized code
higher addresses
^^^^^^^^^^^^
| argv |
------------
| argc |
-------------------
| main's ret addr |
-------------------
| ... |
| main()'s local variables and stuff, layout decided by the compiler
| ... |
------------
| i | # function arg
------------ <-- 16B-aligned boundary for the first arg, as required in the ABI
| ret addr |
------------ <--- esp pointer on entry to the function
|saved ebp | # because gcc -m32 -O0 uses -fno-omit-frame-pointer
------------ <--- ebp after mov ebp, esp (part of no-omit-frame-pointer)
unpredictable amount of padding, up to the compiler. (likely 0 bytes in this case)
but actually not: clang 3.5 for example makes a copy of it's arg (`i`) here, and puts j[] right below that, so j[2] or j[5] will work
------------
| j[1] |
------------
| j[0] |
------------
| |
vvvvvvvvvvvv Lower addresses. (The wikipedia diagram is upside-down, IMO: it has low addresses at the top).
It's somewhat likely that the 8 byte j
array will be placed right below the value written by push ebp
, with no gap. That would make j[0]
16B-aligned, although there's no requirement or guarantee that local arrays have any particular alignment. (Except that C99 variable-length arrays are 16B-aligned, in the AMD64 SysV ABI. I don't remember there being a guarantee for non-variable length arrays, but I didn't check.)
If the function saved any other call-preserved registers (like ebx
) so it could use them, those saved registers would be before or after the saved ebp
, above space used for locals.
j[4]
might work in 32bit code, like @EOF suggested. I assume he arrived at 4 by the same reasoning I did, but forgot to mention that it only applies to 32bit code.
Of course, at what really happens is much better than all this guessing and hand-waving.
I put your function on the Godbolt compiler explorer, with the oldest gcc version it has (4.4.7), using -xc -O0 -Wall -fverbose-asm -m32
. -xc
is to compile as C, not C++.
my_f:
push ebp #
mov ebp, esp #,
sub esp, 40 #, # no idea why it reserves 40 bytes. clang 3.5 only reserves 24
mov DWORD PTR [ebp-16], 99 # j[0]
mov DWORD PTR [ebp-12], 100 # j[1]
mov edx, DWORD PTR [ebp+0] ###### This is the j[4] load
mov eax, OFFSET FLAT:.LC0 # put the format string address into eax
mov DWORD PTR [esp+4], edx # store j[4] on the stack, to become an arg for printf
mov DWORD PTR [esp], eax # store the format string
call printf #
leave
ret
So gcc puts j
at ebp-16
, not the ebp-8
that I guessed. j[4]
gets the saved ebp
. i
is at j[6]
, 8 more bytes up the stack.
Remember, all we've learned here is what gcc 4.4 happens to do at -O0
. There's no rule that says j[6]
will refer to a location that holds a copy of i
on any other setup, or with different surrounding code.
If you want to learn asm from compiler output, look at the asm from -Og
or -O1
at least. -O0
stores everything to memory after every statement, so it's very noisy / bloated, which makes it harder to follow. Depending on what you want to learn, -O3
is good. Obviously you have to write functions that do something with input parameters instead of compile-time constants, so they don't optimize away. See How to remove "noise" from GCC/clang assembly output? (especially the link to Matt Godbolt's CppCon2017 talk), and other links in the x86 tag wiki.
As noted in the diagram, copies i
from the arg slot to a local. Although when it calls printf
, it copies from the arg slot again, not the copy inside its own stack frame.