How does dynamic object access work in V8?

I was surprised to find out that JavaScript objects are not in fact hash maps under the hood, instead they are more similar to structs. From what I understand, getting and setting properties on an object is fast because the memory location of the value is at a fixed offset, as it would be in a struct or a class. What I don’t understand is how the syntax maps to that fixed offset. Ie what happens when the compiler sees obj.a or obj[‘a’]. Is that syntax transformed into an integer offset at run time or compile time or JIT? I guess what I’m trying to understand is how can it transform the incoming string ‘a’ into an integer index efficiently without doing something like index = hash(‘a’) % objectLength.

Maybe the gap in my knowledge is I don’t fully get how structs work at the compiler level.

Solution

(V8 developer here.)

JavaScript objects are not in fact hash maps under the hood, instead they are more similar to structs.

For the record, Bergi correctly points out that this is true in one engine, and even in that engine not always. JavaScript engines have a lot of freedom in how exactly to represent objects internally, and they do make use of that freedom.

What I don’t understand is how the syntax maps to that fixed offset. Ie what happens when the compiler sees obj.a or obj[‘a’]. Is that syntax transformed into an integer offset at run time or compile time or JIT?

The system is based on caching, and "hidden classes" (sometimes referred to as "object shapes" or "[object] shape descriptors").

When you have an object obj = {a: 42, b: "hello", c: null}, it will have a hidden class (let's call it hiddenClassA that lists all properties and their offsets, e.g. "property a is stored at offset 12".

The first execution of a function containing a property load like obj.a will be using unoptimized code. This code will have to inspect the object, find a in its hidden class' list of properties, retrieve the correct offset from there, and then read from that offset in the object to get the property's value. The pair (hidden class, offset) is then cached for this specific property load, so the next lookup (even in still-unoptimized code) will run quite a bit faster, if another object with the same hidden class comes along next time.

If the function runs hot enough, it will eventually get optimized. The optimizing compiler looks at the hidden classes and offsets that unoptimized code has cached, and assumes that future behavior of your app will be just like past behavior, so it will emit a code sequence like:

verify that obj has hidden class hiddenClassA, otherwise deoptimize
load from offset 12

where "deoptimize" means that the entire optimized code for this function will have to be thrown away, because it is apparently based on invalid assumptions, and execution will go back to unoptimized code to collect more type feedback (until a potential later re-optimization with new feedback, if it still runs hot enough). As long as it doesn't have to deopt though, the optimized code will be nearly as fast as what C would do for structs, and it won't have to do any property lookups because it just relies on the cached offsets.

This mechanism is also why it wouldn't make sense to compile optimized code right away: things like property accesses can't reasonably be optimized when the optimizing compiler has no cached type information (generated by unoptimized execution) available. Because then the optimizing compiler would ask exactly the same question you did: "how on earth am I supposed to figure out what offset property a maps to???"