Let's start from a simple example of stack allocation:
void f() {
int a, b;
...
}
If I understand correctly. Then address of a
and b
has a fixed offset from stack base, namely the register ebp
. That's the how compiler find them if we need them in afterwards.
But consider the following code.
void f(int n) {
int a;
alloca(n);
int b;
...
}
If compiler do not do any optimization, the stack would be a->n->b
. Now offset of b
is dependent on n
. Then what did compiler do?
Mimicking How does alloca() work on a memory level? . I tried the following code:
#include <stdio.h>
#include <alloca.h>
void foo(int n)
{
int a;
int *b = alloca(n * sizeof(int));
int c;
printf("&a=%p, b=%p, &c=%p\n", (void *)&a, (void *)b, (void *)&c);
}
int main()
{
foo(5);
return 0;
}
The output is &a=0x7fffbab59d68, b=0x7fffbab59d30, &c=0x7fffbab59d6c
. This time address of a
and c
looks neighboring. Did compiler do some reordering? And if we do not allow compiler to reorder how would compiler find address of c
?
------------Some Update------------
Alright, as long as you believe compilers really matters, let's try x86-64 gcc 13.2, I've modify the code a little bit.
#include <alloca.h>
void alloca_test(int n) {
int a;
int* ptr = (int *) alloca(n);
int b;
a++;
b++;
ptr[0]++;
}
and here's the assembly:
alloca_test(int):
push rbp
mov rbp, rsp
sub rsp, 48
mov DWORD PTR [rbp-36], edi
mov DWORD PTR [rbp-4], 0
mov eax, DWORD PTR [rbp-36]
cdqe
lea rdx, [rax+8]
mov eax, 16
sub rax, 1
add rax, rdx
mov ecx, 16
mov edx, 0
div rcx
imul rax, rax, 16
sub rsp, rax
mov rax, rsp
add rax, 15
shr rax, 4
sal rax, 4
mov QWORD PTR [rbp-16], rax
mov DWORD PTR [rbp-20], 0
add DWORD PTR [rbp-4], 1 <--a++
add DWORD PTR [rbp-20], 1 <--b++
mov rax, QWORD PTR [rbp-16]
mov eax, DWORD PTR [rax]
lea edx, [rax+1]
mov rax, QWORD PTR [rbp-16]
mov DWORD PTR [rax], edx
nop
leave
ret
Here b
has address [rbp-20]
, which don't change w.r.t n
Actually, for a really non-optimizing compiler, it would be the other way around.
Let's imagine a naive, 1970s-tech compiler, since the C language was originally designed to be implemented by compilers like that. And let's ignore the alloca
call for the moment. As the compiler parses the function, each time it encounters a definition of a local variable [*], it assigns it a stack slot, relative to the frame pointer ebp
: so a
at [ebp-4]
, b
at [ebp-8]
and so on, and uses that to address the variable. When it finishes parsing, it has seen all the local variables and can compute the total amount of stack needed, and so insert the appropriate constant in the prologue code that adjusts the stack pointer (e.g. sub esp, 8
). Even a very simple one-pass compiler that emits code line by line can do this, by emitting something like sub esp, 0
in the prologue and then back-patching later to replace the immediate 0
by the correct value.
Now, as for alloca
, it wasn't part of the original C language. Rather, it was basically a "cool hack" that someone discovered, that can be implemented in a way that is completely orthogonal to the process above. We can describe the idea as follows in x86 terms. (The original implementation would have been for PDP-11 or VAX or something like that, but the idea is the same.) Since all the locals are addressed relative to ebp
, then it doesn't matter if the stack pointer esp
is decremented further during the execution of the function; the compiler never refers to esp
. And the stack cleanup in the function epilogue is normally implemented as mov esp, ebp
instead of add esp, 8
, so it will also continue to work fine.
So in fact the compiler doesn't even have to know that alloca(n)
does anything special with regard to the stack. It can be a macro that expands to inline assembly like sub esp, [ebp+8] / mov eax, esp
, which, again, can be opaque to the compiler other than filling in an addressing mode for n
(in this case the first stack arg, at [ebp+8]
assuming traditional use of EBP as a frame pointer). It would play no role at all in the process of allocating stack slots for local variables, since this hypothetical compiler will only ever access locals relative to a frame pointer (EBP), not ESP.
In fact, if you are willing to do a little more stack juggling, alloca
can even be an external library function, that the compiler just treats like any other function call. This is why alloca
has the syntax of a function call, rather than being integrated more deeply into the language - the original implementation was simply a function call.
So with this implementation, we'd have a
and b
at the top of the stack frame (allocated simultaneously in the function prologue), and the alloca
ed buffer below them (allocated at the point of the alloca
call). If we had more calls to alloca
, then they would return progressively lower addresses in the order they were executed, subtracting from the stack pointer each time.
An alloca
that the compiler didn't understand could break things if done in the middle of pushing args for a call to another function, so presumably bar(1, 2, alloca(n), 3)
wasn't safe with early hacky implementations.
Now as you might imagine, this "cool hack" didn't work so well once compilers got smarter and wanted to be able to actually control the stack pointer throughout the execution of the function. Then changing the stack pointer behind the compiler's back, as it were, becomes disastrous, so alloca
had to be given special compiler support, causing a lot of pain for compiler writers. The idea of alloca
was eventually fully adopted into the language when C99 introduced variable-length arrays, but many observers regard that decision as a mistake.
So as for what a modern compiler does, there's not necessarily any single answer. It knows exactly what semantics alloca
needs to provide, and it can make its own decisions as to how to implement that. It's not necessarily constrained to using ebp
-relative addressing for all locals; it can use esp
-relative, or compute the address some other way, or maybe just optimize variables into registers so that they don't occupy a stack slot at all. So we can't easily predict what the stack layout will look like.
[*] Note, in passing, that in assigning stack slots and computing stack usage, the block structure of the function would typically be flattened, and the scopes of the variables would not be taken into account. So even in code like
int a;
if (...) {
int b;
...
}
while (...) {
int c;
...
}
Here a,b,c
will each get their own stack slots, say [ebp-4]
, [ebp-8]
, [ebp-12]
respectively. The stack pointer will be adjusted by 12 in the prologue. In particular, the compiler will not emit instructions to readjust the stack pointer at the beginning and end of each { }
blocks. It could, however, optimize by noticing that the scopes of b
and c
do not overlap, and so it could assign [ebp-8]
to both of them, and use only 8 bytes of stack instead of 12.