Search code examples
rperformancebenchmarkingmicrobenchmark

Relative Benchmarking in R adjusted for Local Machine Specs


I have question regarding benchmarking in R. How can I appropriately estimate relative speed of functions compared to a user defined function.

Say I have 3 user-defined functions f, g and h and I want to report how do g and h perform in terms speed compared to f. Measuring each of these is not a problem (I do know of several libraries) and dividing speed of g by speed of f is a nice idea. But how can I adjust this measurement in a way that it is not too much dependent on my local machine and OS (or is this even necessary)? Of course, 100% precision seems unlikely, but my ultimate goal is to have a metric such that a user or a student is able to get a feeling of how big the improvement in speed is without having to check my specs and hers.

So, what am I looking for?

  • A known metric for this kind of problem
  • A package which does this/reports such a metric
  • An approximation of the relation of speed to my specs so I know at least approximately how the relative speed on my machine would behave on others (linear, exponential, ...)

Final remark: My access to different machines is limited, so just testing it and looking at how they behave is an option, but my last one. I would prefer a good approximation instead.


Solution

  • The relative performance ratio of two functions might differ on different hardware. e.g. one might be more sensitive to FMA throughput (and fall behind on a Zen1, or Intel pre-Haswell), while another (maybe using a lookup table or memoization) might be more sensitive to cache footprint, and get slower past some threshold size that depends on the machine's L2 and L3 cache sizes.

    Probably a useful thing to look at (on one single machine) would be simple relative performance (ratio of times) for different problem sizes, of time(g) / time(f) and time(h) / time(f)

    So you might plot y = performance (in microseconds) against x = problem size for all 3 functions to see the absolute shape of the performance curves.

    You might separately plot relative performance to see where one is faster or slower relative to your baseline.

    If there are parameters other than a single size, then there's more problem space to explore with plots...