Search code examples
javascriptc++objectgarbage-collectionv8

Do languages like JS with a copying GC ever store anything on the cpu registers?


I am learning about GC's and I know there's a thing called HandleScope which 'protects' your local variables from the GC and updates them if a gc heap copy happens. For example, if I have a routine which adds togother 2 values and I call it, it may invoke the garbage collector which will copy the Object that my value is pointing to (or the GC will not even know that the Object the value is pointing to is referenced). A really minimal example:

#include <vector>

Value addValues(Value a, Value b);
std::vector<Value*> gc_vals_with_protection;

Value func(Value a, Value b)
{
    vars.push_back(&a); // set protection for a
    gc_vals_with_protection.push_back(&b); // set protection for b
    Value res = addValues(a, b); // do calcuations
    gc_vals_with_protection.pop_back(); // remove protection for b
    gc_vals_with_protection.pop_back(); // remove protection for a
    return res;
}

But this has got me thinking, it will mean that a and b will NEVER be on the physical CPU registers because you have taken their addresses (and CPU registers don't have addresses) which will make calcuations on them inefficient. Also, at the beggining of every function, you would have to push back twice to the vector (https://godbolt.org/z/dc6vY1Yc5 for assembly).

I think I may be missing something, as this must be not optimal. Is there any other trick I am missing?


Solution

  • (V8 developer here.)

    Do languages like JS with a copying GC ever store anything on the cpu registers?

    Yes, of course. Pretty much anything at all that a CPU does involves its registers.

    That said, JavaScript objects are generally allocated on the heap anyway, for at least the following reasons:
    (1) They are bigger than a register. So registers typically hold pointers to objects on the heap. It's these pointers, not the objects themselves, that Handles are needed for (both to update them, and to inform the GC that there are references to the object in question, so the object must not be freed).
    (2) They tend to be much longer-lived than the typical amount of time you can hold something in a register, which is only a couple of machine instructions: since the set of registers is so small, they are reused for something else all the time (regardless of JavaScript or GC etc), so whatever they held before will either be spilled (usually though not necessarily to the stack), or re-read from wherever it originally came from next time it's needed.
    (3) They have "pointer identity": JavaScript code like obj1 === obj2 (for objects, not primitives) only works correctly when there is exactly one location where an object is stored. Trying to store objects in registers would imply copying them around, which would break this.

    There is certainly some cost to creating Handles; it's faster than adding something to a std::vector though.
    Also, when passing Handles from one function to another, the called function doesn't have to re-register anything: Handles can be passed around without having to create new entries in the HandleScope's backing store.

    A very important observation is that JavaScript functions don't need Handles for their locals. When executing JavaScript, V8 carefully keeps track of the contents of the stack (i.e. spilled contents of registers), and can walk and update the stack directly. HandleScopes are only needed for C++ code dealing with JS objects, because this technique isn't possible for C++ stack frames (which are controlled by the C++ compiler). Such C++ code is typically not the most critical performance bottleneck of an app; so while its performance certainly matters, some amount of overhead is acceptable.
    (Side note: one can "blindly" (i.e. without knowledge about their contents) scan C++ stack frames and do so-called "conservative" (instead of "precise") garbage collection; this comes with its own pros and cons, in particular it makes a moving GC impossible, so is not directly relevant to your question.)

    Taking this one step further: sufficiently "hot" functions will get compiled to optimized machine code; this code is the result of careful analysis and hence can be quite aggressive about keeping values (primarily numbers) in registers as long as possible, for example for chains of calculations before the final result is eventually stored in some property of some object.

    For completeness, I'll also mention that sometimes, entire objects can be held in registers: this is when the optimizing compiler successfully performs "escape analysis" and can prove that the object never "escapes" to the outside world. A simple example would be:

    function silly(a, b) {
      let vector = {x: a, y: b};
      return vector.x + vector.y;
    }
    

    When this function gets optimized, the compiler can prove that vector never escapes, so it can skip the allocation and keep a and b in registers (or at least as "standalone values", they might still get spilled to the stack if the function is bigger and needs those registers for something else).