I would like to compute the difference in seconds between two POSIXct objects without copying them. This shouldn't be a problem, because they are stored as numeric
UNIX time, so a plain subtraction is all that is necessary. The issue is that the -
operator is dispatched based on the classes of the objects, and difftime
is called. The difftime
function copies each input vector twice:
> a <- as.POSIXct(runif(1e6, 0, 1000), origin = '1970-01-01')
> b <- as.POSIXct(runif(1e6, 0, 1000), origin = '1970-01-01')
> a_trace <- tracemem(a)
> b_trace <- tracemem(b)
> z <- a - b
tracemem[0x000000004c082470 -> 0x000007fff54e0010]: difftime -.POSIXt
tracemem[0x000007fff8c80010 -> 0x000007ffe9490010]: difftime -.POSIXt
tracemem[0x000007ffe9490010 -> 0x000007ffe8530010]: structure .difftime difftime -.POSIXt
tracemem[0x000007ffe8530010 -> 0x000007ffe7d80010]: structure .difftime difftime -.POSIXt
Another problem with this is that by default difftime
may choose an output unit besides seconds. This is avoided by calling it explicitly with a units argument, but the four copies are still made:
> z <- difftime(a, b, units = 'secs')
tracemem[0x000000004c082470 -> 0x000007ffe70a0010]: difftime
tracemem[0x000007fff8c80010 -> 0x000007ffe68f0010]: difftime
tracemem[0x000007ffe68f0010 -> 0x000007ffde890010]: structure .difftime difftime
tracemem[0x000007ffde890010 -> 0x000007ffde0e0010]: structure .difftime difftime
Also, the resulting object is of class difftime
, instead of plain numeric
. Using base R, an addtional copy of the result is necessary to eliminate the difftime
class:
> z_trace <- tracemem(z)
> class(z) <- NULL
tracemem[0x000007ffb28e0010 -> 0x000007ffb2130010]:
Using data.table::setattr
I have devised the following function:
fast_difftime <- function(a, b) {
classA <- attr(a, 'class')
classB <- attr(b, 'class')
on.exit({
data.table::setattr(a, 'class', classA)
data.table::setattr(b, 'class', classB)
})
data.table::setattr(a, 'class', NULL)
data.table::setattr(b, 'class', NULL)
a - b
}
This avoids the copying, and is much faster:
> microbenchmark::microbenchmark(fast_difftime(a, b), as.numeric(difftime(a, b, units = "secs")))
Unit: milliseconds
expr min lq mean median uq max neval cld
fast_difftime(a, b) 1.728555 4.213836 5.97520 4.392592 6.365763 127.1690 100 a
as.numeric(difftime(a, b, units = "secs")) 6.643092 19.352806 24.54938 19.861066 23.298505 137.0776 100 b
However, I don't like the fact that I have to modify the attributes of the input vectors in-place, just to avoid method dispatch. Is there a better way?
Rcpp would be an option because you can ignore the class attribute:
library(Rcpp)
cppFunction(
'NumericVector mydiff(const NumericVector x, const NumericVector y) {
return x - y;
}
')
microbenchmark::microbenchmark(fast_difftime(a, b), mydiff(a, b))
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# fast_difftime(a, b) 2.248841 2.291861 3.489386 2.326559 2.379951 46.69430 100 a
# mydiff(a, b) 2.165105 2.209661 3.089114 2.229380 2.272144 10.96047 100 a