Search code examples
c++memorystack-frame

How does the compiler allocate memory for conditionally declared automatic variables in C++?


Say I have a function where depending on some runtime condition an expensive automatic object is created or a cheap automatic object is created:

void foo() {
   if (runtimeCondition) {
       int x = 0;
   } else {
       SuperLargeObject y;
   }
}

When the compiler allocates memory for a stack frame for this function, will it just allocate enough memory to store the SuperLargeObject, and if the condition leading to the int is true that extra memory will just be unused? Or will it allocate memory some other way?


Solution

  • It depends on your compiler and on the optimization settings. In unoptimized builds most C++ compilers will probably allocate stack memory for both objects and use one or the other based on which branch is taken. In optimized builds things get more interesting:

    If both objects (the int and the SuperLargeObject are not used and the compiler can prove that constructing SuperLargeObject does not have side effects, both allocations will be elided.

    If the objects escape the function, i.e. their addresses are passed to another function, the compiler has to provide memory for them. But since their lifetimes don't overlap, they can be stored in overlapping memory regions. It is up to the compiler if that actually happens or not.

    As you can see here, different compilers generate different assembly for these two functions: (Modified example from OP and reference, all compiled for x86-64)

    void escape(void const*);
    
    struct SuperLargeObject {
        char data[104];
    };
    
    void f(bool cond) {
        if (cond) {
            int x;
            escape(&x);
        }
        else {
            SuperLargeObject y;
            escape(&y);
        }
    }
    
    void g() {
        SuperLargeObject y;
        escape(&y);
    }
    

    Note that all stack allocations are odd multiples of 8, because the x86-64 ABI mandates the stack pointer to be 16 byte aligned, and 8 bytes are pushed by the call instruction for the return address (Thanks to @PeterCordes for explaining this to me on another post).

    ICC

    f(bool):
            sub       rsp, 120
            test      dil, dil
            lea       rax, QWORD PTR [104+rsp]
            lea       rdx, QWORD PTR [rsp]
            cmovne    rdx, rax
            mov       rdi, rdx
            call      escape(void const*)
            add       rsp, 120
            ret
    g():
            sub       rsp, 104
            lea       rdi, QWORD PTR [rsp]
            call      escape(void const*)
            add       rsp, 104
            ret
    

    ICC seems to allocate enough memory two store both objects and then selects between the two non-overlapping regions based on the runtime condition (using cmov) and passes the selected pointer to the escaping function.

    In the reference function g it only allocates 104 bytes, exactly the size of SuperBigObject.

    GCC

    f(bool):
            sub     rsp, 120
            mov     rdi, rsp
            call    escape(void const*)
            add     rsp, 120
            ret
    g():
            sub     rsp, 120
            mov     rdi, rsp
            call    escape(void const*)
            add     rsp, 120
            ret
    

    GCC also allocates 120 bytes, but it places both objects at the same address and thus emits no cmov instruction.

    Clang

    f(bool):
            sub     rsp, 104
            test    edi, edi
            mov     rdi, rsp
            call    escape(void const*)@PLT
            add     rsp, 104
            ret
    g():
            sub     rsp, 104
            mov     rdi, rsp
            call    escape(void const*)@PLT
            add     rsp, 104
            ret
    

    Clang also merges the two allocations and also reduces the allocation size to the necessary 104 bytes.

    Unfortunately I don't understand why it tests the condition in function f.


    You should also note, that when the compiler can place either or both of the variables in registers, no memory will be allocated at all, even when they are used and reassigned throughout the function. For int's and long's and other small objects that is most often the case, if their addresses to not escape the function.