Search code examples
rdata.tablemoving-averagerolling-computation

fast way to calculate moving average/rolling function which allows custom weights


One can use TTR:SMA() or TTR::EMA() but these do not allow custom weights. One solution I came up with is using data.table::frollapply:

library(data.table)
x <- data.table(type=rep(1:100, 10000), val=sample(1:1000000, 1000000))

my.roll.2 <- function(x, weights=NULL) {
    n <- length(x)
    if(is.null(weights)) weights <- 1/(1:n)
    sum(weights[1:n]*x, na.rm=T)/sum(weights[1:n])
}

my.roll.1 <- function(x, n, name, ref.col, val.col, weights=NULL) {
    x[, (name) := frollapply(get(val.col), n, my.roll.2, weights=weights), by=ref.col]
}

However the performance of my.roll.1 is not great (and it gets exponentially worse compared to the other ones the bigger the input data):

library(microbenchmark)
library(TTR)
n <- 10
name <- 'test'
microbenchmark(
    my.roll=my.roll.1(x, n, name, 'type', 'val')
  , frollmean=x[, (name):=data.table::frollmean(val, n), type]
  , EMA=x[, (name):=TTR::EMA(val, n), type]
  , times=10L
)

Results:

Unit: milliseconds
      expr       min        lq       mean     median        uq       max neval
   my.roll 7661.0278 7666.0732 7698.69693 7693.28025 7708.6880 7778.6171    10
 frollmean   17.1724   17.6321   19.54878   19.56485   20.9490   23.5549    10
       EMA   43.0090   43.7332   45.92251   45.79210   47.2391   51.9399    10

data.table::frollmean is very fast (implemented in C) but it doesn't use any weights. TTR::EMA uses only the EMA specific weights/smoothing (the only flexibility is to use the parameter wilder=TRUE or wilder=FALSE). My need is to achieve the functionality of my.roll.1 but to be faster.


Solution

  • As mentioned in comments, a possible solution using stats::filter.

    Note following arguments:

    • sides = 1 so that the filter uses past values only
    • coefficients order inverted 1/(n:1) because filter calculation starts with most recent value
    my.roll[,test2:=filter(val,prop.table(1/n:1),sides=1),by=.(type)]
    # Conversion from ts to numeric
    my.roll[,test2:=as.numeric(test2)][]
    
    # type    val     test    test2
    # <int>  <int>    <num>    <num>
    #       1:     1 955625       NA       NA
    #       2:     2 979596       NA       NA
    #       3:     3 578778       NA       NA
    #       4:     4 174631       NA       NA
    #       5:     5 459947       NA       NA
    # ---                               
    #  999996:    96 191233 620505.8 620505.8
    #  999997:    97 626522 398615.6 398615.6
    #  999998:    98 527846 565061.2 565061.2
    #  999999:    99 480277 537305.9 537305.9
    # 1000000:   100 757433 395458.3 395458.3
    
    all.equal(my.roll$test,my.roll$test2)
    #[1] TRUE
    
    

    Speed comparison:

    microbenchmark::microbenchmark(
      my.roll=my.roll.1(x, n, name, 'type', 'val'),
      filter={my.roll[,test2:=filter(val,prop.table(1/n:1),sides=1),by=.(type)][];
              my.roll[,test2:=as.numeric(test2)][]
              }
      , times=10L
    )
    
    Unit: milliseconds
        expr       min        lq       mean     median        uq       max neval
     my.roll 2194.3200 2203.3726 2264.67423 2245.04510 2314.2377 2401.1156    10
      filter   73.1602   76.3098   78.18358   77.25665   80.4204   85.0567    10