Why is i 2 slower than (i + 1) 2 in V8

Consider the following snippets and results from running:

Snippet 1:

let final_result, final_result2;
let start = new Date();
for(let i = 0; i < 100000000; i++) {
    final_result = Math.pow(i + 1, 2);
}
let end = new Date();
console.log(end - start); // Output 1

let start2 = new Date();
for(let i = 0; i < 100000000; i++) {
    final_result2 = (i + 1) ** 2;
}
let end2 = new Date();
console.log(end2 - start2); // Output 2

Snippet 2:

let final_result, final_result2;
let start = new Date();
for(let i = 0; i < 100000000; i++) {
    final_result = Math.pow(i, 2);
}
let end = new Date();
console.log(end - start); // Output 1

let start2 = new Date();
for(let i = 0; i < 100000000; i++) {
    final_result2 = i ** 2;
}
let end2 = new Date();
console.log(end2 - start2); // Output 2

Snippet 3:

let final_result, final_result2;

function t1(){
    for(let i = 0; i < 100000000; i++) {
        final_result = Math.pow(i, 2);
    }
}

function t2(){
    for(let i = 0; i < 100000000; i++) {
        final_result2 = i ** 2;
    }
}

let start = new Date();
t1();
let end = new Date();
console.log(end - start); // Output 1

let start2 = new Date();
t2();
let end2 = new Date();
console.log(end2 - start2); // Output 2

Results:

Output	Firefox 88 (ms)	Edge 90 (ms)
Snippet 1 - Output 1	63	467
Snippet 1 - Output 2	63	487
Snippet 2 - Output 1	63	468
Snippet 2 - Output 2	63	1180
Snippet 3 - Output 1	64	480
Snippet 3 - Output 2	64	1200

These results were obtained consistently over numerous tests and the number being added did not affect performance, i.e. other similar operations ((i * 1) ** 2, (i + i) ** 2, etc.) all resulted in a speed up over just using i ** 2. Meanwhile Math.pow is consistent in its speed.

How can repeated calculations of (i + n) ** 2 be faster than i ** 2 when the latter has less to calculate when using a V8 browser (Edge and Chrome both had similar results), meanwhile Firefox's runtime was consistent between the 2 snippets.

Solution

How can repeated calculations of (i + n) ** 2 be faster than i ** 2 when the latter has less to calculate?

That's because this microbenchmark is not measuring exponentiation time. Beware of misleading microbenchmarks!

Instead, what it is measuring is:

HeapNumber allocations and related operations (write barriers, garbage collection),
in the slower cases, function call overhead (as opposed to inlining) and some of the checks dictated by the JS spec (that didn't get optimized away).

One of the fundamental architectural differences between Spidermonkey and V8 is that the former uses "NaN-boxing" whereas the latter uses "pointer-tagging". Both have pros and cons; in this particular case the consequence is that V8 needs to allocate a fresh "HeapNumber" for every result that you write to final_result, whereas Firefox can just write the raw IEEE double there. (This is pretty much the worst-case comparison for the pointer-tagging approach.) That explains the speed difference between the two engines. This is easy to verify by modifying the test such that it stores the results into an array (i.e. let final_result = []; and final_result[0] = ...) -- in which case V8's "array elements kind" tracking kicks in and it stores raw doubles as well.

The slower cases using ** instead of Math.pow appear to be untapped optimization potential in V8. There's a key comment in the source:

// We currently don't optimize exponentiation based on feedback.

The commit that introduced this comment gives more background: the ** operator used to be "syntactic sugar" for Math.pow, and V8 actually implemented it by "desugaring" it to the latter; but with the introduction of BigInt support, it had to stop doing that. As usual, the first implementation aimed at correctness rather than maximum performance, and this first implementation is still in use today (which probably implies that this detail isn't particularly important in real-world code... otherwise someone would have complained before). That means that V8's optimizing compiler currently lacks the type feedback it would need to inline the exponentiation; instead it emits a call to a "built-in", which has to allocate a HeapNumber for the result it wants to return. But, being a reasonably clever optimizing compiler, it can propagate type information from elsewhere; that's why adding some other operation (such as (i+1) ** 2, or even (i+0) ** 2) has a beneficial impact in this case.

Summary: Don't be fooled by microbenchmarks. Drawing useful conclusions from a microbenchmark really requires inspecting what the engine is doing under the hood; otherwise there's an overwhelming chance that you're not measuring what you think you're measuring. Also, this is a nice illustration of the other problem with microbenchmarks: chances are that in your real code, something is different about the surrounding circumstances (e.g. you may be storing into arrays, or you may be doing additional operations that generate type feedback, etc), so the results from the microbenchmark likely aren't even applicable.

Why is i ** 2 slower than (i + 1) ** 2 in V8

Why is i 2 slower than (i + 1) 2 in V8