I was comparing two functions' performance in Chrome's console, when I stumbled across something I can't explain.
const A = 3,
B = 2,
C = 4,
D = 2;
const mathCompare = (a1, a2, b1, b2) => {
return Math.abs(Math.log(a1/a2)) < Math.abs(Math.log(b1/b2));
};
const logicCompare = (a1, a2, b1, b2) => {
return (a1 > a2 ? a1/a2 : a2/a1) < (b1 > b2 ? b1/b2 : b2/b1);
};
const runLooped = (j) => {
for(let i = 0; i < 4; i++) {
jsCompare(mathCompare, logicCompare, j);
}
}
const jsCompare = (f1, f2, iterations) => {
let a = jsPerf(f1, iterations);
let b = jsPerf(f2, iterations);
console.warn( iterations + " iterations:\n" + f1.name + ": " + a + " ms\n" + f2.name + ": " + b + " ms\n" + "delta: " + (a-b) + " ms");
}
const jsPerf = (f, iterations) => {
let start = performance.now();
for(let i = 0; i < iterations; i++) {
f(A, B, C, D);
}
return performance.now() - start;
}
runLooped(10000000);
As performance only changed after a complete set of iterations - one call of jsCompare() - I decided to try again with a less de-composed structure:
const A = 3,
B = 2,
C = 4,
D = 2;
const mathCompare = (a1, a2, b1, b2) => {
return Math.abs(Math.log(a1/a2)) < Math.abs(Math.log(b1/b2));
};
const logicCompare = (a1, a2, b1, b2) => {
return (a1 > a2 ? a1/a2 : a2/a1) < (b1 > b2 ? b1/b2 : b2/b1);
};
const compareRaw = (f1, f2, maxI, maxJ) => {
for(let i = 0; i < maxI; i++) {
let j,
a = performance.now();
for(j = 0; j < maxJ; j++) {
f1(A, B, C, D);
}
let b = performance.now();
for(j = 0; j < maxJ; j++) {
f2(A, B, C, D);
}
let c = performance.now();
console.warn( j + " iterations:\n" + f1.name + ": " + (b-a) + " ms\n" + f2.name + ": " + (c-b) + " ms\n" + "delta: " + ((b-a) - (c-b)) + " ms");
}
};
const runRaw = (i) => {
compareRaw(mathCompare, logicCompare, 4, i);
};
runRaw(10000000);
A completely different outcome. Results stabilize around set 3, after some fluctuations.
Chrome 79.0.3945.130:
Vivaldi 2.6.1566.49 (V8 7.5.288.30):
I was intrigued and tried everything again, but this time with random numbers. The project I was testing this for would obviously never call these function n * 10M times with the same parameters.
const mathCompare = (a1, a2, b1, b2) => {
return Math.abs(Math.log(a1/a2)) < Math.abs(Math.log(b1/b2));
};
const logicCompare = (a1, a2, b1, b2) => {
return (a1 > a2 ? a1/a2 : a2/a1) < (b1 > b2 ? b1/b2 : b2/b1);
};
const compareRawRandom = (f1, f2, maxI, maxJ) => {
let randoms = [...Array(maxJ + 3)].map(()=>Math.floor(Math.random()*10));
for(let i = 0; i < maxI; i++) {
let j,
a = performance.now();
for(j = 0; j < maxJ; j++) {
f1(randoms[j], randoms[j + 1], randoms[j + 2], randoms[j + 3]);
}
let b = performance.now();
for(j = 0; j < maxJ; j++) {
f2(randoms[j], randoms[j + 1], randoms[j + 2], randoms[j + 3]);
}
let c = performance.now();
console.warn( j + " iterations:\n" + f1.name + ": " + (b-a) + " ms\n" + f2.name + ": " + (c-b) + " ms\n" + "delta: " + ((b-a) - (c-b)) + " ms");
}
}
const runRawRandom = (i) => {
compareRawRandom(mathCompare, logicCompare, 4, i);
};
const jsCompareRandom = (f1, f2, iterations) => {
let randoms = [...Array(iterations + 3)].map(()=>Math.floor(Math.random()*10));
let a = jsPerfRandom(f1, iterations, randoms);
let b = jsPerfRandom(f2, iterations, randoms);
console.warn( iterations + " iterations:\n" + f1.name + ": " + a + " ms\n" + f2.name + ": " + b + " ms\n" + "delta: " + (a-b) + " ms");
}
const jsPerfRandom = (f, iterations, randoms) => {
let start = performance.now();
for(let i = 0; i < iterations; i++) {
f(randoms[i], randoms[i + 1], randoms[i + 2], randoms[i + 3]);
}
return performance.now() - start;
}
const runRandomLooped = (j) => {
for(let i = 0; i < 4; i++) {
jsCompareRandom(mathCompare, logicCompare, j);
}
}
runRandomLooped(10000000);
runRawRandom(10000000);
runRandomLooped() shows the same weird low first 10M set for mathCompare().
runRawRandom() the less-decomposed version doing exactly the same calculations, however, again stabilizes after 2 sets of 10M. But this time both functions show identical performance of ~23ms for 10M calls.
This only shows up on Chrome/Chromium browsers. Tested on:
I also tested on Firefox 72.0.2 which showed constant performance over sets and both ways of looping.
I'm running an AMD FX-8350 on current Win10.
I guess it's something to do with how V8 optimizes during runtime, but I would not have expected performance to go down in that case.
V8 developer here. As wOxxOm points out, this is mostly an illustration of the pitfalls of microbenchmarking.
First off:
Unexpected de-optimisation...
Nope, deoptimization is not the issue here (it's a very specific term with a very specific meaning). You mean "slowdown".
...on high-iteration count
Nope, high iteration count is also not the issue here. While you could say that the lack of warmup in your test is contributing to the results you see, that contribution is the part you found less surprising.
One mechanism to be aware of is "on-stack replacement": functions with (very) long-running loops will get optimized while the loop is executing. It doesn't make sense to do that on a background thread, so execution is interrupted while optimization happens on the main thread. If the code after the loop hasn't been executed yet and therefore doesn't have type feedback, then the optimized code will be thrown away ("deoptimized") once execution reaches the end of the loop, to collect type feedback while executing unoptimized bytecode. In case of another long-running loop like in the example here, the same OSR-then-deopt dance will be repeated. That means some nontrivial fraction of what you're measuring is optimization time. This explains most of the variance you see in runRawRandom
before times stabilize.
Another effect to be aware of is inlining. The smaller and faster the function in question, the bigger the call overhead, which is avoided when you write the benchmark such that the function can get inlined. Additionally, inlining often unlocks additional optimization opportunities: in this case, the compiler can see after inlining that the comparison result is never used, so it dead-code-eliminates all the comparisons. This explains why runRandomLooped
is so much slower than runRawRandom
: the latter benchmarks empty loops. The former's very first iteration is "fast" (=empty) because V8 at that point inlines mathCompare
for the f(...)
call in jsPerfRandom
(because that's the only function it's ever seen there), but soon after it realizes "whoops, various different functions are getting called here", so it deopts and won't try to inline again on subsequent optimization attempts.
If you care about the details, you can use some combination of the flags --trace-opt --trace-deopt --trace-osr --trace-turbo-inlining --print-opt-code --code-comments
to investigate the behavior in depth. Be warned though that while this exercise is likely going to cost you significant time, what you can learn from the behavior of a microbenchmark is, in all likelihood, not going to be relevant for real-world use cases.
To illustrate:
mathCompare
is much slower than logicCompare
In practice though, all three observations are false (which is not overly surprising given that two of them are direct contradictions of each other):