Search code examples
javascriptmemory-managementclosuresv8

How v8 handle stack allocated variable in closure?


I read a lot of articles that say "v8 uses stack for allocating primitive like numbers". I also ready about the CG works only for the heap allocation. But if I combine the stack allocated variables with the closure, who is in changed to free the stack allocated variable?

For example:

function foo() {
    const b = 5;

    return function bar(x) {
        return x * b
    }
}

// This invocation allocate in the stack the variable `b`
// in the head the code of `bar`
const bar = foo()
// here the `b` should be freed

// here `b` is used, so should not be free
bar()

how it works? How can bar function point to b if b lives in the stack? How is the [[Environment]] built here?


Solution

  • (V8 developer here.)

    I don't know where this myth is coming from that "primitives are allocated on the stack". It's generally false: the regular case in JavaScript is that everything is allocated on the heap, primitive or not.

    There may be implementation-specific special cases where some heap allocations can be optimized out and/or replaced by stack allocations, but that's the exception, not the rule; and it's never directly observable (i.e. never changes behavior, only performance), because that's the general rule for all internal optimizations.

    To dive deeper, we need to distinguish two concepts: variables themselves, and the things they point at.

    A variable can be thought of as a pointer. In other words, it's not in itself the "container" or "space" where an object is allocated; instead it's a reference that points at an object that's allocated elsewhere. All variables have the same size (1 pointer), the things they point at can vary wildly in size. One illustrating consequence is that the same variable can point at different things over time, for example you could have a loop over an array where element = array[i] points at each array element in turn.
    In modern, high-performance JS engines, function-local variables are usually stored on the stack (regardless of what they point at!). That's because this is both fast and convenient. So while this is technically still an implementation-specific exception to the rule that everything is allocated on the heap, it's a fairly common exception.
    As you rightly observe, storing variables on the stack doesn't work if they need to survive the function that created them. Therefore, JavaScript engines perform analysis passes to find out which variables are referenced from nested closures, and store these variables on the heap right away, in order to allow them to stay around as long as they are needed.
    I wouldn't be surprised if an engine that prefers simplicity over performance chose to always store all variables on the heap, so it wouldn't have to distinguish several cases.

    Regarding the value that the variable points at: that's always on the heap, regardless of its type or primitive-ness (with exceptions to the rule, see below).
    var a = true --> true is on the heap.
    var b = "hello" --> "hello" is on the heap.
    var c = 42.2 --> 42.2 is on the heap.
    var d = 123n --> 123n is on the heap.
    var e = new Object(); --> the object is on the heap.
    Again, there are engine-specific cases where heap allocations can be optimized out under the right circumstances. For example, V8 (inspired by some other VMs) has a well-known trick where it can store small integers ("Smis") directly in the pointer using a tag bit, so in this case the pointer doesn't actually point at a value, the pointer is the value so to speak. An alternative trick is called "NaN-boxing", it's used e.g. by Spidermonkey and has the effect that all JS Numbers can be stored directly in the pointer (or technically the other way round: everything's a Number in this approach, and pointers are stored as special numbers).
    As another example, once a function gets hot enough for optimization, an optimizing compiler may be able to figure out that a given object isn't accessible outside the function and hence doesn't need to be allocated at all; if necessary some of the object's properties will be held in registers or on the stack for the part of the function where they are needed.

    So, to summarize the above:

    • "All primitives are allocated on the stack" is incorrect. Most primitives are allocated on the heap.
    • Sometimes, an engine can avoid allocations (of both primitives and objects), which may or may not mean that the respective value is briefly held on the stack (it could also be eliminated entirely, or only ever held in registers). Such optimizations never change observable behavior; in cases where doing the optimization would affect behavior, the optimization can't be applied.
    • Variables, regardless of what they refer to, are stored on the heap or on the stack or not at all, depending on the requirements of the situation.