Search code examples
javascriptperformancememory-managementgarbage-collectionv8

Do browsers always treat strings and numbers in Javascript as immutable?


In Javscript, do browser runtime interpreters always treat strings and numbers as immutable?

Surely, in cases where it provably is harmless, they would optimize and treat them as mutable. And if not, why not?

For instance, consider the humble for loop.

for (let i = 0; i < 1000000000000; i++) {
 console.log(i)
}

Since the i variable is scoped to the loop, and no code in the loop ever needs the "old values" of the i variable, it would make sense for the browser to simply increment the number that the symbol i points to each iteration. Otherwise, a stream of new bytes of memory will be taken up by the new values of i, for no conceivable reason ("someone might need those old values of i!"). We will have an unnecessary race between the for loop (creating new values of i in memory) and the garbage collector (killing off all the old values of i), which the loop will generally win, and we will have a stack overflow.

Oh, that's what happens, isn't it. If so, why are browsers dumb this way when they are so intelligent in optimizing the code in other ways?

There is a similar situation for strings. Consider the following.

{
   let completeWorks = "This string dictates the complete works of William Shakespeare. To be or not to be that is the question whether it is nobler in the mind..."
   completeWorks += "The End."  // <-- what happens here?
}

The string completeWorks is block-scoped and provably only lives in this block. So surely when the browser encounters the instruction completeWorks += "The End" it will just mutate completeWorks. If not, why not? Probably there is a good reason they don't do this, and I would like to learn it.


Solution

  • (V8 developer here -- as such I know very little about other browsers/engines.)

    There's no easy answer to this; implementations are complicated.

    Strings, in V8, are always immutable (after creation). One reason is that with objects being allocated on the heap, there's typically no free space after an object, so we can't just append characters to an existing string. Another reason is that keeping track of which strings can safely be mutated would add an extraordinary amount of complexity (aside from a few easier-to-detect niche cases, but if only those are supported, then the mechanism would provide much less value).

    V8 does have a few nifty tricks for string manipulations up its sleeve: when you take a substring of a larger string, then no characters are copied; the new string is simply a reference that says "I'm a slice of length X of that other string over there, starting at index Y". Similarly, when concatenating two strings like your completeWorks example, the new string is a reference that says "I'm the concatenation of those two other strings". (For completeness, I'll mention that there are minimum character counts below which these tricks are not applied because simply copying the characters is at least as efficient.)

    Numbers are both more performance sensitive and easier to deal with than strings. In general, heap-allocated numbers are always immutable; but that's not the end of the story. V8 heavily uses a special representation for "Smis" ("small integers"), because many numbers in JavaScript programs fall into that bucket. Smis are not heap objects; creating a new one is as cheap as modifying one, and in fact indistinguishable (like an int in C++). For numbers out of Smi range, the optimizing compiler also performs "escape analysis" and can "unbox" non-escaping numbers, which means keeping them in a CPU register (as a plain 64-bit float) instead of allocating them on the heap in the first place, which again is even better than mutating otherwise-immutable heap objects. For the special case of numbers stored in object properties, V8 also (in some cases) uses mutable storage.

    So, the answer to your question is both "yes" (e.g. when generating unoptimized code, V8 doesn't spend the time to perform analysis, so the code must conservatively assume that any old value is needed somewhere), and "no" (for the optimizing compiler, your intuition is correct that this should be avoidable; however that still doesn't mean that any numbers that were allocated on the heap will be mutated there).

    Since the i variable is scoped to the loop

    Scoping in JavaScript is complicated. First off, there is no int i. Now consider this:

    for (var i = 0; i < 100; i++) {
      // Use i here, or don't.
    }
    console.log(i);  // Prints "100".
    

    If you meant let i, then sure, you'd have a block-scoped variable. In this example, performance would be the same.

    We will have an unnecessary race between the for loop (creating new values of i in memory) and the garbage collector (killing off all the old values of i), which the loop will generally win

    No. The garbage collector is highly adaptive, in particular it does more work when more allocations happen. There is no way to "outrun" it. If needed, program execution is stopped while the garbage collector tries to find memory that can be freed.

    and we will have a stack overflow.

    No, stack overflows have nothing to do with object allocations, or garbage collection, or heap memory in general.