JavaScript array performance drop at 13k-16k elements

I am doing some performance testing regarding creating and editing performance of arrays and noticed that there are some weird characteristics around arrays with about 13k-16k elements.

The below graphs show the time per element it takes to create an array and read from it (in this case summing the numbers in it). capacity and push relate to the way the array was created:

capacity: const arr = new Array(length) and then arr[i] = data
push: const arr = []; and then arr.push(data)

As you can see, in both cases, the creation of an array and reading from it, there is a performance reduction of about 2-3 times compared to the performance per element at 1k less elements.
When creating an array using the push method, this jump happens a bit earlier compared to creating it with the correct capacity beforehand. I assume that this is happening because, when pushing to an array that is already at max capacity, more than the actually needed extra capacity is added (to avoid having to add new capacity again soon) and the threshold for the slower performance path is hit earlier.

If you want to see the code or test it yourself: github

My question, why is the performance dropping at around 13k-16k?

To me it seems that larger arrays are treated differently in v8 starting at around 13k-16k elements to improve performance for them, but the cut-off point (at least in my code) is slightly too early so the performance drops before the optimizations bring any benefit.

You can see the performance improvements going down after about 500 elements and picking up again after the drop.

Sadly I can't find any information about that.

Also, if you happen to have any idea why there are those spikes at the end of creating with capacity and summing with push, feel free to let me know :)

Edit:

As suggested by @ggorlen I run the same test on a different machine to rule out caching as the reason for the seen behavior (using a different, weaker CPU and less RAM). The results seem very similar.

Edit:

I ran node using the --allow-natives-syntax flag to debug log the created arrays with %DebugPrint(array);, hoping to see a difference between the different array lengths, but besides the length and memory adresses they all look the same. Here is one as an example:

// For array created with capacity
DebugPrint: 000002CB8E1ACE19: [JSArray]
 - map: 0x035206283321 <Map(HOLEY_SMI_ELEMENTS)> [FastProperties]
 - prototype: 0x036b86245b19 <JSArray[0]>
 - elements: 0x02cb8e1ace39 <FixedArray[1]> [HOLEY_SMI_ELEMENTS]
 - length: 1
 - properties: 0x0114c5d01309 <FixedArray[0]>
 - All own properties (excluding elements): {
    00000114C5D04D41: [String] in ReadOnlySpace: #length: 0x03f907ac1189 <AccessorInfo> (const accessor descriptor), location: descriptor
 }

0000035206283321: [Map]
 - type: JS_ARRAY_TYPE
 - instance size: 32
 - inobject properties: 0
 - elements kind: HOLEY_SMI_ELEMENTS
 - unused property fields: 0
 - enum length: invalid
 - back pointer: 0x035206283369 <Map(PACKED_SMI_ELEMENTS)>
 - prototype_validity cell: 0x03f907ac15e9 <Cell value= 1>
 - instance descriptors #1: 0x009994a6aa31 <DescriptorArray[1]>
 - transitions #1: 0x009994a6a9d1 <TransitionArray[4]>Transition array #1:
     0x0114c5d05949 <Symbol: (elements_transition_symbol)>: (transition to PACKED_DOUBLE_ELEMENTS) -> 0x0352062832d9 <Map(PACKED_DOUBLE_ELEMENTS)>

 - prototype: 0x036b86245b19 <JSArray[0]>
 - constructor: 0x031474c124e9 <JSFunction Array (sfi = 000003CECD93C3A9)>
 - dependent code: 0x0114c5d01239 <Other heap object (WEAK_FIXED_ARRAY_TYPE)>
 - construction counter: 0


// For array created with push
DebugPrint: 000003B09882CE19: [JSArray]
 - map: 0x02ff94f83369 <Map(PACKED_SMI_ELEMENTS)> [FastProperties]
 - prototype: 0x0329b3805b19 <JSArray[0]>
 - elements: 0x03b09882ce39 <FixedArray[17]> [PACKED_SMI_ELEMENTS]
 - length: 1
 - properties: 0x03167aa81309 <FixedArray[0]>
 - All own properties (excluding elements): {
    000003167AA84D41: [String] in ReadOnlySpace: #length: 0x02094f941189 <AccessorInfo> (const accessor descriptor), location: descriptor
 }

000002FF94F83369: [Map]
 - type: JS_ARRAY_TYPE
 - instance size: 32
 - inobject properties: 0
 - elements kind: PACKED_SMI_ELEMENTS
 - unused property fields: 0
 - enum length: invalid
 - back pointer: 0x03167aa81599 <undefined>
 - prototype_validity cell: 0x02094f9415e9 <Cell value= 1>
 - instance descriptors #1: 0x00d25122aa31 <DescriptorArray[1]>
 - transitions #1: 0x00d25122aa01 <TransitionArray[4]>Transition array #1:
     0x03167aa85949 <Symbol: (elements_transition_symbol)>: (transition to HOLEY_SMI_ELEMENTS) -> 0x02ff94f83321 <Map(HOLEY_SMI_ELEMENTS)>

 - prototype: 0x0329b3805b19 <JSArray[0]>
 - constructor: 0x009ff8a524e9 <JSFunction Array (sfi = 0000025A84ABC3A9)>
 - dependent code: 0x03167aa81239 <Other heap object (WEAK_FIXED_ARRAY_TYPE)>
 - construction counter: 0

Edit

The performance drop for the summing happens when going from an array of size 13_994 to 13_995:

Solution

(V8 developer here.)

TL;DR: Nothing to see here, just microbenchmarks producing confusing artifacts.

There are two separate effects here:

What happens at 16384 elements is that the backing store is allocated in "large object space", a special region of the heap that's optimized for large objects. In Chrome, where pointer compression is enabled, this happens at exactly twice as many elements as in Node, where pointer compression is off. It has the consequence that the allocation itself can no longer happen as an inlined sequence of instructions directly in optimized code; instead it's a call to a C++ function; which aside from having some call overhead is also a more generic implementation and as such a bit slower (there might be some optimization potential, not sure). So it's not an optimization that kicks in too early; it's just a cost that's paid by huge objects. And it'll only show up prominently in (tiny microbenchmark?) Cases that allocate many large arrays and then don't do much with them.
What happens at 13995 elements is that for a the specific function sum, that's when the optimizing compiler kicks in to "OSR" (on-stack replace) the function, i.e. the function is replaced with optimized code while it is running. That's a certain one-off cost no matter when it happens, and it will pay for itself shortly after. So perceiving this as a specific hit at a specific time is a typical microbenchmarking artifact that's irrelevant in real-world usage. (For instance, if you ran the test multiple times in the same process, you wouldn't see a step at 13995 any more. If you ran it with multiple sizes, then chances are OSR wouldn't be needed (as the function could switch to optimized code the next time it's called) and this one-off cost wouldn't happen at all.)