Search code examples
rfunctiondata.tablemicrobenchmark

R: Microbenchmark library, how to best create a table of multiple timing results where the results also need to be modified. Ideally functionised


I wish to compare the speed between data.table and dplyr functions.

Namely, I wish to rename a column with each library and see what the median and relative timings are for each.

The following code uses the inbuilt dataset "swiss". What I've done works however it will bloat the script if I want to do it multiple times so I'm after a approach that could functionise what I've done or a different but more efficient approach;

library(microbenchmark)
library(dplyr)
library(data.table)


DT <- as.data.table(datasets::swiss, keep.rownames=TRUE)
Tib <- tibble::as_tibble(tibble::rownames_to_column(datasets::swiss))

res <-
microbenchmark(
  data.table = setnames(copy(DT), old = "rn", new = "Region"),
  copyOverhead = copy(DT),
  dplyr = rename(Tib, Region = rowname),
  unit = "ms",
  times = 100
)

Note that since data.table modifies the object by reference I had to create a copy of it within the benchmark otherwise it would fail after the first iteration since the 'setnames' old value would be changed. As such I need to record the time that 'copy()' takes and subtract this from the data.table setnames() timing.

res <- setDT(summary(res))

res <- res[1, median := median - res[2, median]][c(1,3), .(expr, median)] # subtract the time it takes to copy and delete the copyOverhead row afterwards

res[which.max(res[, median]), relative := res[which.max(res[, median]), median] / res[which.min(res[, median]), median]] # calculate the relative timings

res[which.min(res[, median]), relative := 1]

res

As I had to subtract the time of the copy() operation, the microbenchmark object is now a data.table object so I cant just print the relative units with print(res, unit = "relative), I had to do it manually.

So, this works. But I want to create many different comparisons of this nature. Is there somehow I could functionise the approach to remove the timings of the copy() operation and somehow wrap the microbenchmark function within this? I thought it might involve something with the ... argument however I dont actually know how this properly works and cant make the arguments passed to ... in the wrapper function use the arguments from the microbenchmark function.

I'd also be open to a different approach alltogether!


Solution

  • You can use the setup argument:

    microbenchmark(
      data.table = setnames(dt, old = "rn", new = "Region"),
      dplyr = rename(tb, Region = rowname),
      setup = {dt <- copy(DT); tb <- copy(Tib)},
      unit = "ms",
      times = 100
    )