Search code examples
cx86stack-framealloca

How did alloca() interact with other stack allocation?


Let's start from a simple example of stack allocation:

void f() {
    int a, b;
    ...
}

If I understand correctly. Then address of a and b has a fixed offset from stack base, namely the register ebp. That's the how compiler find them if we need them in afterwards.

But consider the following code.

void f(int n) {
    int a;
    alloca(n);
    int b;
    ...
}

If compiler do not do any optimization, the stack would be a->n->b. Now offset of b is dependent on n. Then what did compiler do?

Mimicking How does alloca() work on a memory level? . I tried the following code:

#include <stdio.h>
#include <alloca.h>

void foo(int n)
{
    int a;
    int *b = alloca(n * sizeof(int));
    int c;
    printf("&a=%p, b=%p, &c=%p\n", (void *)&a, (void *)b, (void *)&c);
}

int main()
{
    foo(5);
    return 0;
}

The output is &a=0x7fffbab59d68, b=0x7fffbab59d30, &c=0x7fffbab59d6c. This time address of a and c looks neighboring. Did compiler do some reordering? And if we do not allow compiler to reorder how would compiler find address of c?

------------Some Update------------

Alright, as long as you believe compilers really matters, let's try x86-64 gcc 13.2, I've modify the code a little bit.

#include <alloca.h>
void alloca_test(int n) {
    int a;
    int* ptr = (int *) alloca(n);
    int b;
    a++;
    b++;
    ptr[0]++;
}

and here's the assembly:

alloca_test(int):
        push    rbp
        mov     rbp, rsp
        sub     rsp, 48
        mov     DWORD PTR [rbp-36], edi
        mov     DWORD PTR [rbp-4], 0
        mov     eax, DWORD PTR [rbp-36]
        cdqe
        lea     rdx, [rax+8]
        mov     eax, 16
        sub     rax, 1
        add     rax, rdx
        mov     ecx, 16
        mov     edx, 0
        div     rcx
        imul    rax, rax, 16
        sub     rsp, rax
        mov     rax, rsp
        add     rax, 15
        shr     rax, 4
        sal     rax, 4
        mov     QWORD PTR [rbp-16], rax
        mov     DWORD PTR [rbp-20], 0
        add     DWORD PTR [rbp-4], 1    <--a++
        add     DWORD PTR [rbp-20], 1   <--b++
        mov     rax, QWORD PTR [rbp-16]
        mov     eax, DWORD PTR [rax]
        lea     edx, [rax+1]
        mov     rax, QWORD PTR [rbp-16]
        mov     DWORD PTR [rax], edx
        nop
        leave
        ret

Here b has address [rbp-20], which don't change w.r.t n


Solution

  • Actually, for a really non-optimizing compiler, it would be the other way around.

    Let's imagine a naive, 1970s-tech compiler, since the C language was originally designed to be implemented by compilers like that. And let's ignore the alloca call for the moment. As the compiler parses the function, each time it encounters a definition of a local variable [*], it assigns it a stack slot, relative to the frame pointer ebp: so a at [ebp-4], b at [ebp-8] and so on, and uses that to address the variable. When it finishes parsing, it has seen all the local variables and can compute the total amount of stack needed, and so insert the appropriate constant in the prologue code that adjusts the stack pointer (e.g. sub esp, 8). Even a very simple one-pass compiler that emits code line by line can do this, by emitting something like sub esp, 0 in the prologue and then back-patching later to replace the immediate 0 by the correct value.

    Now, as for alloca, it wasn't part of the original C language. Rather, it was basically a "cool hack" that someone discovered, that can be implemented in a way that is completely orthogonal to the process above. We can describe the idea as follows in x86 terms. (The original implementation would have been for PDP-11 or VAX or something like that, but the idea is the same.) Since all the locals are addressed relative to ebp, then it doesn't matter if the stack pointer esp is decremented further during the execution of the function; the compiler never refers to esp. And the stack cleanup in the function epilogue is normally implemented as mov esp, ebp instead of add esp, 8, so it will also continue to work fine.

    So in fact the compiler doesn't even have to know that alloca(n) does anything special with regard to the stack. It can be a macro that expands to inline assembly like sub esp, [ebp+8] / mov eax, esp, which, again, can be opaque to the compiler other than filling in an addressing mode for n (in this case the first stack arg, at [ebp+8] assuming traditional use of EBP as a frame pointer). It would play no role at all in the process of allocating stack slots for local variables, since this hypothetical compiler will only ever access locals relative to a frame pointer (EBP), not ESP.

    In fact, if you are willing to do a little more stack juggling, alloca can even be an external library function, that the compiler just treats like any other function call. This is why alloca has the syntax of a function call, rather than being integrated more deeply into the language - the original implementation was simply a function call.

    So with this implementation, we'd have a and b at the top of the stack frame (allocated simultaneously in the function prologue), and the allocaed buffer below them (allocated at the point of the alloca call). If we had more calls to alloca, then they would return progressively lower addresses in the order they were executed, subtracting from the stack pointer each time.

    An alloca that the compiler didn't understand could break things if done in the middle of pushing args for a call to another function, so presumably bar(1, 2, alloca(n), 3) wasn't safe with early hacky implementations.


    Now as you might imagine, this "cool hack" didn't work so well once compilers got smarter and wanted to be able to actually control the stack pointer throughout the execution of the function. Then changing the stack pointer behind the compiler's back, as it were, becomes disastrous, so alloca had to be given special compiler support, causing a lot of pain for compiler writers. The idea of alloca was eventually fully adopted into the language when C99 introduced variable-length arrays, but many observers regard that decision as a mistake.

    So as for what a modern compiler does, there's not necessarily any single answer. It knows exactly what semantics alloca needs to provide, and it can make its own decisions as to how to implement that. It's not necessarily constrained to using ebp-relative addressing for all locals; it can use esp-relative, or compute the address some other way, or maybe just optimize variables into registers so that they don't occupy a stack slot at all. So we can't easily predict what the stack layout will look like.


    [*] Note, in passing, that in assigning stack slots and computing stack usage, the block structure of the function would typically be flattened, and the scopes of the variables would not be taken into account. So even in code like

    int a;
    if (...) {
        int b;
        ...
    }
    while (...) {
        int c;
        ...
    }
    

    Here a,b,c will each get their own stack slots, say [ebp-4], [ebp-8], [ebp-12] respectively. The stack pointer will be adjusted by 12 in the prologue. In particular, the compiler will not emit instructions to readjust the stack pointer at the beginning and end of each { } blocks. It could, however, optimize by noticing that the scopes of b and c do not overlap, and so it could assign [ebp-8] to both of them, and use only 8 bytes of stack instead of 12.