Avoiding polymorphism when re-using code as optimization

One skill in programming is code re-use (DRY). However, as a function typically¹ is only compiled once, calling the same function with slightly different data will lead to polymorphism, even when the data never mixes per call - the shape of the data is partitioned to the call-sites. This is detrimental to inline caches, and other optimizations.

const f = a => {
  let diagonal = 0;
  for (let i = 0; i < 256; i += 16 + 1) diagonal += a[i].x;
  return diagonal;
};
// One place in code:
const a1 = Array.from({ length: 256 }, () => ({ x: 0 }));
f(a1);
// Different place, almost unrelated:
const a2 = Array.from({ length: 256 }, () => ({ x: 0, hiddenClassChange: 0 }));
f(a2);

As this pattern happens a lot, it also applies to bottlenecks. Is there something v8 already does to solve this, a misunderstanding on my side, or a way to solve it, other than ctrl+c ctrl+v on the textual representation of the function in the code base?

In the scenario that kicked off this question, two places have different ways of addressing (x, y) => data, while the iteration stays the same. I'd like to not copy&paste the iteration code, but if otherwise it can't inline the addressing schemes, potentially leading to even more missed optimizations (escape analysis, etc), I might have to.

[1]: When a function is inlined, can the code embedded into the outer function have a different inline cache than at another call-site? If that were the case, and it would do so in a predictable reliable way, this could solve the problem.

As was asked, here is an example. Note that this way of benchmarking isn't generally applicable, it should only serve to show "there is a difference". The real case is too large to post here.

const f = a => {
  let diagonal = 0;
  for (let i = 0; i < 256; i += 16 + 1) diagonal += a[i].x;
  return diagonal;
};
// Copy of `f`.
const g = a => {
  let diagonal = 0;
  for (let i = 0; i < 256; i += 16 + 1) diagonal += a[i].x;
  return diagonal;
};
// Could write this more compact, but this is faster to understand.
const a1 = Array.from({ length: 256 }, () => ({ x: 0, a: 0 }));
const a2 = Array.from({ length: 256 }, () => ({ x: 0, b: 0 }));
const a3 = Array.from({ length: 256 }, () => ({ x: 0, c: 0 }));
const a4 = Array.from({ length: 256 }, () => ({ x: 0, d: 0 }));
const a5 = Array.from({ length: 256 }, () => ({ x: 0, e: 0 }));
const a6 = Array.from({ length: 256 }, () => ({ x: 0, f: 0 }));
const a = [a1, a2, a3, a4, a5, a6];
// Prime `f` and `g`.
for (let i = 0; i < 100000; i++) {
  for (const an of a) f(an);
  g(a1);
}
// Very naive time measure.
let start = performance.now();
for (let i = 0; i < 100000; i++) f(a1);
console.log(performance.now() - start);
start = performance.now();
for (let i = 0; i < 100000; i++) g(a1);
console.log(performance.now() - start);

Solution

As a quick addition to @JonasWilms' answer:

Yes, this effect is a thing; no, there's no general way to avoid it, neither in the engine nor in user code. Flexible/dynamic/polymorphic code comes at a cost.

We've spent quite some time thinking about ways to improve there, but for several reasons it's hard to come up with a plan that's actually better across the board:

the differences aren't that big in practice
supporting every possible pattern is practically impossible, so it always boils down to drawing the line where the fast path gives up somewhere
the fact that JS is so dynamic (e.g.: functions can be overwritten at any time, prototypes and prototype chains can be modified at any time) makes it really difficult to get speedups through caching. "Difficult" here doesn't mean "engine developers are too lazy to do it", rather it means "the amount of checks you need to make sure the engine still behaves correctly when JS code does something weird is so large that these checks will eat the performance benefits that the fast path might otherwise have provided".

What makes us particularly sad is that you can run into a very similar issue with inheritance:

class Base {
  constructor(a) { this.a = a; }
  f() {
    let diagonal = 0;
    for (let i = 0; i < 256; i += 16 + 1) {
      diagonal += this.a[i];  // With enough subclasses, "this.a" becomes megamorphic.
    }
    return diagonal;
  };
}
class Sub1 extends Base { ... }
class Sub2 extends Base { ... }
class Sub2_1 extends Sub2 { ... }
// etc...

(new Sub1(...)).f();
(new Sub2(...)).f();
(new Sub2_1(...)).f();
// etc...

Depending on your situation, sometimes for particularly hot code this can be worked around by designing an object model that keeps some common parts monomorphic. For example, instead of deriving from a base class and adding this.* fields in subclasses, you add a single this.kind_specific_data field to the base class, and sub-types of objects use that to store a single object containing their specific data. Taking an AST to illustrate, instead of:

class Node {
  constructor() { this.id = GetNextNodeId(); }
}
class BinaryOperation extends Node {
  constructor(left, right) { super(); this.left = left; this.right = right; }
}

you'd do:

class Node {
  constructor(data = null) { this.id = GetNextNodeId(); this data = data; }
}
function MakeBinaryOperation(left, right) {
  return new Node({left, right});
}

and then algorithms iterating over your AST that only need to access Node IDs (and other common-to-all-nodes infrastructure) can be very fast because all accesses are monomorphic. You'll probably want to add some custom scheme to distinguish the various sub-types, maybe a .type property.

This pattern isn't always applicable, and can be more verbose than inheritance based designs. On the bright side, it can have unrelated benefits, such as improving modularity, testability, and reusability of your code. The performance differences can also be quite small, so in the majority of cases, there's no need to worry about it.