Search code examples
rtime-seriesforecasting

Cross-validation of arima forecast in R


I would like to make sure that I am understanding R's fit and summary functions.

Here's how I'm using them for a time series with 100 data points which are divided into train and test samples:

x = ts(MyData)
train = x[1:80,1]
test = x[81:length(x), 1]
fit = arima(train, order=c(1,1,0))
summary(fit, test)

Am I correct in my thinking that summary will compare the fitted model for time steps 81 through 100 to the actual values x[81], x[82], ..., x[100]?


Solution

  • methods(summary) shows the following list:

    > methods(summary)
     [1] summary.aov                    summary.aovlist*            summary.aspell*
     [4] summary.check_packages_in_dir* summary.connection          summary.data.frame     
     [7] summary.Date                   summary.default             summary.ecdf*
    [10] summary.factor                 summary.glm                 summary.infl*
    [13] summary.lm                     summary.loess*              summary.manova
    [16] summary.matrix                 summary.mlm*                summary.nls*       
    [19] summary.packageStatus*         summary.PDF_Dictionary*     summary.PDF_Stream*
    [22] summary.POSIXct                summary.POSIXlt             summary.ppr*          
    [25] summary.prcomp*                summary.princomp*           summary.proc_time  
    [28] summary.shingle*               summary.srcfile             summary.srcref      
    [31] summary.stepfun                summary.stl*                summary.table 
    [34] summary.trellis*               summary.tukeysmooth*        summary.yearmon*   
    [37] summary.yearqtr*               summary.zoo*
    

    As you can see there are no methods for the Arima class (which is the class of your fit object), so this is not what happens (i.e. you are not comparing your forecasts with your actual values). You are using summary.default from the above list.

    You can see this from the following too:

    a <- arima(USAccDeaths, order = c(1,1,0))
    identical(summary(a), summary(a, USAccDeaths[1:100]))
    #[1] TRUE
    

    There is no difference between summary(a) and summary(a, USAccDeaths[1:100]).

    To compare using RMSE:

    library(forecast)
    fit <- arima(USAccDeaths[1:50], order = c(1,1,0))
    preds <- as.vector(forecast(fit, h = 10)$mean)
    RMSE <- sqrt(mean((preds - as.vector(USAccDeaths[51:60])) ^ 2))
    RMSE
    #[1] 2056.483
    

    The closer it is to zero the better your model.