Search code examples
rtime-seriescross-validationforecastingarima

forecasting with tscv auto.arima predicted values in R


I want to do an out-of-sample forecast experiment using the auto.arima function. Further, time series cross validation with a fixed rolling window size should be applied. The goal is to obtain one step forecasts for 1,3 and 6 steps ahead.

library(forecast)
library(tseries)

#the time series
y1 = 2+ 0.15*(1:20) + rnorm(20,2)
y2 = y1[20]+ 0.3*(1:30) + rnorm(30,2)
y =  as.ts(c(y1,y2))

#10obs in test set, 40obs in training set
ntest <- 10
ntrain <- length(y)-ntest

#auto.arima with some prefered specifications
farima <- function(x,h){forecast(auto.arima(x,ic="aic",test=c("adf"),seasonal=FALSE, 
                                        stepwise=FALSE, approximation = FALSE,
                                        method=c("ML")),h=h)}

# executing the following function, gives the forecast errors in a matrix for each one-step forecast
e <- tsCV(y,farima,h = 6,window=40)

The predicted values are given by subtracting the error from the true value:

#predicted values
fc1 <- c(NA,y[2:50]-e[1:49,1])
fc1 <- fc1[41:50]

fc3 <- c(NA,y[2:50]-e[1:49,3])
fc3 <- fc3[41:50]

fc6 <- c(NA,y[2:50]-e[1:49,6])
fc6 <- fc6[41:50]

However I´m curious whether the predicted values for the 3-step ahead are coded correctly. Since the first 3-step ahead forecast is the prediction of the 43th observation? Also i dont understand why the matrix e for the 3-step ahead error [3th column] has a value for observation 40. Since i thought the first 3-step ahead forecast is obtained for observation 43 and thus there shouldnt be an error for observation 40.


Solution

  • Always read the help file:

    Value

    Numerical time series object containing the forecast errors as a vector (if h=1) and a matrix otherwise. The time index corresponds to the last period of the training data. The columns correspond to the forecast horizons.

    So tsCV() returns errors in a matrix where the (i,j)th entry contains the error for forecast origin i and forecast horizon h. So the value in row 40 and column 3 is a 3-step error made at time 40, for time period 43.