Search code examples
javascriptarraysdoublev8

Is an array of ints actually implemented as an array of ints in JavaScript / V8?


There is claim in this article that an array of ints in JavaScript is implemented by a C++ array of ints.

However; According to MDN unless you specifically use BigInts, in JavaScript all numbers are repressed as doubles.

If I do:

cont arr = [0, 1, 2, 3];

What is the actual representation in the V8 engine?

The code for V8 is here on github, but I don't know where to look:


Solution

  • (V8 developer here.) "C++ array of ints" is a bit of a simplification, but the key idea described in that article is correct, and an array [0, 1, 2, 3] will be stored as an array of "Smis".

    What's a "Smi"? While every Number in JavaScript must behave like an IEEE754 double, V8 internally represents numbers as "small integer" (31 bits signed integer value + 1 bit tag) when it can, i.e. when the number has an integral value in the range -2**30 to 2**30-1, to improve efficiency. Engines can generally do whatever they want under the hood, as long as things behave as if the implementation followed the spec to the letter. So when the spec (or MDN documentation) says "all Numbers are doubles", what it really means from the engine's (or an engine developer's) point of view is "all Numbers must behave as if they were doubles".

    When an array contains only Smis, then the array itself keeps track of that fact, so that values loaded from such arrays know their type without having to check. This matters e.g. for a[i] + 1, where the implementation of + doesn't have to check whether a[i] is a Smi when it's already known that a is a Smi array.
    When the first number that doesn't fit the Smi range is stored in the array, it'll be transitioned to an array of doubles (strictly speaking still not a "C++ array", rather a custom array on the garbage-collected heap, but it's similar to a C++ array, so that's a good way to explain it).
    When the first non-Number is stored in an array, what happens depends on what state the array was in before: if it was a "Smi array", then it only needs to forget the fact that it contains only Smis. No rewriting is needed, as Smis are valid object pointers thanks to their tag bit. If the array was a "double array" before, then it does have to be rewritten, so that each element is a valid object pointer. All the doubles will be "boxed" as so-called "heap numbers" (objects on the managed heap that only wrap a double value) at this point.

    In summary, I'd like to point out that in the vast majority of cases, there's no need to worry about any of these internal implementation tricks, or even be aware of them. I certainly understand your curiosity though! Also, array representations are one of the more common reasons why microbenchmarks that don't account for implementation details can easily be misleading by suggesting results that won't carry over to a larger app.


    Addressing comments:

    V8 does sometimes even use int16 or lower.

    Nope, it does not. It may or may not start doing so in the future; though if anything does change, I'd guess that untagged int32 is more likely to be introduced than int16; also if anything does change about the implementation then of course the observable behavior would not change.
    If you believe that your application would benefit from int16 storage, you can use an Int16Array to enforce that, but be sure to measure whether that actually benefits you, because quite likely it won't, and may even decrease performance depending on what your app does with its arrays.

    It may start to be a double when you make it a decimal

    Slightly more accurately: there are several reasons why an array of Smis needs to be converted to an array of doubles, such as:

    • storing a fractional value in it, e.g. 0.5
    • storing a large value in it, e.g. 2**34
    • storing NaN or Infinity or -0 in it