Search code examples
rdatetimedispatch

How to calculate time differences without copying


I would like to compute the difference in seconds between two POSIXct objects without copying them. This shouldn't be a problem, because they are stored as numeric UNIX time, so a plain subtraction is all that is necessary. The issue is that the - operator is dispatched based on the classes of the objects, and difftime is called. The difftime function copies each input vector twice:

> a <- as.POSIXct(runif(1e6, 0, 1000), origin = '1970-01-01')
> b <- as.POSIXct(runif(1e6, 0, 1000), origin = '1970-01-01')
> a_trace <- tracemem(a)
> b_trace <- tracemem(b)
> z <- a - b
tracemem[0x000000004c082470 -> 0x000007fff54e0010]: difftime -.POSIXt 
tracemem[0x000007fff8c80010 -> 0x000007ffe9490010]: difftime -.POSIXt 
tracemem[0x000007ffe9490010 -> 0x000007ffe8530010]: structure .difftime difftime -.POSIXt 
tracemem[0x000007ffe8530010 -> 0x000007ffe7d80010]: structure .difftime difftime -.POSIXt 

Another problem with this is that by default difftime may choose an output unit besides seconds. This is avoided by calling it explicitly with a units argument, but the four copies are still made:

> z <- difftime(a, b, units = 'secs')
tracemem[0x000000004c082470 -> 0x000007ffe70a0010]: difftime 
tracemem[0x000007fff8c80010 -> 0x000007ffe68f0010]: difftime 
tracemem[0x000007ffe68f0010 -> 0x000007ffde890010]: structure .difftime difftime 
tracemem[0x000007ffde890010 -> 0x000007ffde0e0010]: structure .difftime difftime 

Also, the resulting object is of class difftime, instead of plain numeric. Using base R, an addtional copy of the result is necessary to eliminate the difftime class:

> z_trace <- tracemem(z)
> class(z) <- NULL
tracemem[0x000007ffb28e0010 -> 0x000007ffb2130010]:

Using data.table::setattr I have devised the following function:

fast_difftime <- function(a, b) {

  classA <- attr(a, 'class')
  classB <- attr(b, 'class')

  on.exit({
    data.table::setattr(a, 'class', classA)
    data.table::setattr(b, 'class', classB)
  })

  data.table::setattr(a, 'class', NULL)
  data.table::setattr(b, 'class', NULL)

  a - b

}

This avoids the copying, and is much faster:

> microbenchmark::microbenchmark(fast_difftime(a, b), as.numeric(difftime(a, b, units = "secs")))
Unit: milliseconds
                                       expr      min        lq     mean    median        uq      max neval cld
                        fast_difftime(a, b) 1.728555  4.213836  5.97520  4.392592  6.365763 127.1690   100  a 
 as.numeric(difftime(a, b, units = "secs")) 6.643092 19.352806 24.54938 19.861066 23.298505 137.0776   100   b

However, I don't like the fact that I have to modify the attributes of the input vectors in-place, just to avoid method dispatch. Is there a better way?


Solution

  • Rcpp would be an option because you can ignore the class attribute:

    library(Rcpp)
    cppFunction(
      'NumericVector mydiff(const NumericVector x, const NumericVector y) {
           return x - y;
       }
      ')
    
    
    microbenchmark::microbenchmark(fast_difftime(a, b), mydiff(a, b))
    #Unit: milliseconds
    #                expr      min       lq     mean   median       uq      max neval cld
    # fast_difftime(a, b) 2.248841 2.291861 3.489386 2.326559 2.379951 46.69430   100   a
    #        mydiff(a, b) 2.165105 2.209661 3.089114 2.229380 2.272144 10.96047   100   a