One can use TTR:SMA()
or TTR::EMA()
but these do not allow custom weights. One solution I came up with is using data.table::frollapply
:
library(data.table)
x <- data.table(type=rep(1:100, 10000), val=sample(1:1000000, 1000000))
my.roll.2 <- function(x, weights=NULL) {
n <- length(x)
if(is.null(weights)) weights <- 1/(1:n)
sum(weights[1:n]*x, na.rm=T)/sum(weights[1:n])
}
my.roll.1 <- function(x, n, name, ref.col, val.col, weights=NULL) {
x[, (name) := frollapply(get(val.col), n, my.roll.2, weights=weights), by=ref.col]
}
However the performance of my.roll.1
is not great (and it gets exponentially worse compared to the other ones the bigger the input data):
library(microbenchmark)
library(TTR)
n <- 10
name <- 'test'
microbenchmark(
my.roll=my.roll.1(x, n, name, 'type', 'val')
, frollmean=x[, (name):=data.table::frollmean(val, n), type]
, EMA=x[, (name):=TTR::EMA(val, n), type]
, times=10L
)
Results:
Unit: milliseconds
expr min lq mean median uq max neval
my.roll 7661.0278 7666.0732 7698.69693 7693.28025 7708.6880 7778.6171 10
frollmean 17.1724 17.6321 19.54878 19.56485 20.9490 23.5549 10
EMA 43.0090 43.7332 45.92251 45.79210 47.2391 51.9399 10
data.table::frollmean
is very fast (implemented in C) but it doesn't use any weights. TTR::EMA
uses only the EMA specific weights/smoothing (the only flexibility is to use the parameter wilder=TRUE
or wilder=FALSE
). My need is to achieve the functionality of my.roll.1
but to be faster.
As mentioned in comments, a possible solution using stats::filter
.
Note following arguments:
1/(n:1)
because filter calculation starts with most recent valuemy.roll[,test2:=filter(val,prop.table(1/n:1),sides=1),by=.(type)]
# Conversion from ts to numeric
my.roll[,test2:=as.numeric(test2)][]
# type val test test2
# <int> <int> <num> <num>
# 1: 1 955625 NA NA
# 2: 2 979596 NA NA
# 3: 3 578778 NA NA
# 4: 4 174631 NA NA
# 5: 5 459947 NA NA
# ---
# 999996: 96 191233 620505.8 620505.8
# 999997: 97 626522 398615.6 398615.6
# 999998: 98 527846 565061.2 565061.2
# 999999: 99 480277 537305.9 537305.9
# 1000000: 100 757433 395458.3 395458.3
all.equal(my.roll$test,my.roll$test2)
#[1] TRUE
Speed comparison:
microbenchmark::microbenchmark(
my.roll=my.roll.1(x, n, name, 'type', 'val'),
filter={my.roll[,test2:=filter(val,prop.table(1/n:1),sides=1),by=.(type)][];
my.roll[,test2:=as.numeric(test2)][]
}
, times=10L
)
Unit: milliseconds
expr min lq mean median uq max neval
my.roll 2194.3200 2203.3726 2264.67423 2245.04510 2314.2377 2401.1156 10
filter 73.1602 76.3098 78.18358 77.25665 80.4204 85.0567 10