Why does my benchmark for memory usage in Node.js seem wrong?

I wanted to test memory usage for objects in node.js. My approach was simple: I first use process.memoryUsage().heapUsed / 1024 / 1024 to get the baseline memory. And I have an array of sizes, i.e. the number of entries in objects const WIDTHS = [100, 500, 1000, 5000, 10000] and I plan to loop through it and create objects of that size and compare the current memory usage with the baseline memory.

function memoryUsed() {
    const mbUsed = process.memoryUsage().heapUsed / 1024 / 1024 
    return mbUsed
}


function createObject(size) {
  const obj = {};
  for (let i = 0; i < size; i++) {
    obj[Math.random()] = i;
  }

  return obj;
}

const SIZES = [100, 500, 1000, 5000, 10000, 50000, 100000, 500000, 1000000]

const memoryUsage = {}

function fn() {
  SIZES.forEach(size => {
  const before = memoryUsed()
  const obj = createObject(size)
  const after = memoryUsed()
  const diff = after - before
  memoryUsage[size] = diff
})
}

fn()

but the results didn't look correct:

{
  '100': 0.58087158203125,
  '500': 0.0586700439453125,
  '1000': 0.15680694580078125,
  '5000': 0.7640304565429688,
  '10000': 0.30365753173828125,
  '50000': 7.4157257080078125,
  '100000': 0.8076553344726562,
}

It doesn't make sense. Also, since the object memoryUsage that records the memory usage itself takes up more memory as it grows so I think it adds some overhead.

What are some of the more robust and proper ways to benchmark memory usage in node.js?

Solution

The key thing missing controlling for the garbage collector kicking in at "random" times. To attribute used memory to specific actions performed, you need to manually trigger full GC runs before taking a measurement. Concretely, modify memoryUsed so it reads:

function memoryUsed() {
  gc();
  const mbUsed = process.memoryUsage().heapUsed / 1024 / 1024;
  return mbUsed;
}

and run the test in Node with --expose-gc. Then you'll get reasonable numbers:

{
  '100': 0.20072174072265625,
  '500': 0.0426025390625,
  '1000': 0.08499908447265625,
  '5000': 0.37823486328125,
  '10000': 0.7519683837890625,
  '50000': 4.9071807861328125,
  '100000': 9.80963134765625,
  '500000': 43.04571533203125,
  '1000000': 86.08901977539062
}

The first result (for 100) is obviously too high; not sure why (and if I repeat the test for 100 at the end, its result is in line with the others). The other numbers check out: for a 2x or 5x increase in the number of properties, memory consumption increases by approximately the same 2x or 5x.

My high-level comment is that I'm not sure what you're trying to measure here. The majority of memory used in this scenario is spent on strings of the form "0.38459934255705686", whereas your description seems to indicate that you're more interested in objects.
The marginal cost of one object property depends on the state of the object: when several objects share a shape/"hidden class", then each property in an object takes just one pointer (4 bytes in browsers [32-bit platforms, or pointer compression], 8 bytes in Node [64-bit without pointer compression]). When an object is in dictionary mode, each additional property will take about 6 pointers on average (depending on when the dictionary's backing store needs to be grown), so 24 or 48 bytes depending on pointer size. (The latter is the scenario that this test is creating.)
In both cases, this is just the additional size of the object holding the property; the property's name and value might of course need additional memory.