Let us assume we observe data over a 1-year horizon on a monthly basis (k = 12). Let us also assume the following recursive structure (example below).
I wonder, how this structure can be programmed most efficiently considering the following aspects (Efficiency will be absolutely necessary):
MWE:
beta = c(0.95, 0.89, 0.23, 0.96, 0.98, 0.79, 0.99, 0.85, 0.97)
dim(beta) <- c(3, 3)
phi = c(0.45, 0.12, 0.98, 0.66, 0.79, 0.24, 0.33, 0.19, 0.27)
dim(phi) <- c(3, 3)
(In general, the parameters beta and phi are not observed but have to be estimated, so this is just an assumption for simplicity.)
In k-1 we calculate a matrix of the following form:
K1 <- beta %*% t(beta)
In k-2 we have a similar expression, but now it also depends on the previous value times phi:
K2 <- beta %*% t(beta) + phi %*% K1
In the period k-3 it is the same as in k-2, only that K1 changes to K2. So from here on it is recursive and repeats itself. So for the last observation (k-11) it is the same formula with K10.
First of all you always need the term beta %*% t(beta)
, so store it instead of calculating it in each iteration:
btb <- beta %*% t(beta)
Now you can use the Reduce
-function.
Reduce(f = function(kPrevious,someOther) {btb + phi %*% kPrevious},
x = 1:10,
init = btb,
accumulate = TRUE
)
The argument accumulate=TRUE
saves all the intermediate results.
An alternative would be using simple for
-loop. But this is not as eficient as Reduce
. (Although it does the "same"):
history <- list(btb)
for(k in 1:10) {
history <- c(history,list(btb + phi %*% history[[length(history)]]))
}
history
For calculating some benchmarks, lets put everything in a function, dependent on the number of iterations:
ff0 <- function(n=10) {
Reduce(f = function(kPrevious,someOther) {btb + phi %*% kPrevious},
x = 1:n,
init = btb,
accumulate = TRUE
)
}
ff <- function(n=10) {
history <- list(btb)
for(k in 1:n) {
history <- c(history,list(btb + phi %*% history[[length(history)]]))
}
history
}
Now, lets see whether the two functions give the same output:
all(mapply(FUN = function(x,y) {all(x==y)},ff0(),ff()))
To get a benchmark, lets use microbenchmark
:
microbenchmark::microbenchmark(ff0(),ff())
# Unit: microseconds
# expr min lq mean median uq max neval
# ff0() 20.6 22.80 25.388 24.1 25.20 79.2 100
# ff() 11.4 12.25 13.922 13.0 13.65 80.0 100
microbenchmark::microbenchmark(ff0(100),ff(100))
#Unit: microseconds
# expr min lq mean median uq max neval
# ff0(100) 163.1 172.50 191.193 186.35 198.00 320.6 100
# ff(100) 143.0 151.45 169.125 164.75 174.05 296.1 100
microbenchmark::microbenchmark(ff0(1000),ff(1000))
#Unit: milliseconds
# expr min lq mean median uq max neval
# ff0(1000) 1.5680 1.62505 1.698665 1.67135 1.74185 2.6058 100
# ff(1000) 4.6626 4.77065 5.293329 4.87640 5.10260 11.2255 100
microbenchmark::microbenchmark(ff0(10000),ff(10000))
#Unit: milliseconds
# expr min lq mean median uq max neval
# ff0(10000) 15.9035 16.4428 17.28437 17.0618 17.71085 22.1506 100
# ff(10000) 468.8924 484.7448 494.79808 489.4221 496.60075 581.0389 100
As we can see, the for
-loop implementation seems to to better on calculations with a relatively small number of iterations. But as the iterations go up, Reduce
becomes faster. I think the reason is that Reduce
produces some more overhead than the simple foor
-loop, but this overhead doesn't weigh much once it is created.
So it is up to you which implementation you wil use.