I would like to make sure that I am understanding R's fit
and summary
functions.
Here's how I'm using them for a time series with 100 data points which are divided into train and test samples:
x = ts(MyData)
train = x[1:80,1]
test = x[81:length(x), 1]
fit = arima(train, order=c(1,1,0))
summary(fit, test)
Am I correct in my thinking that summary
will compare the fitted model for time steps 81 through 100 to the actual values x[81], x[82], ..., x[100]
?
methods(summary)
shows the following list:
> methods(summary)
[1] summary.aov summary.aovlist* summary.aspell*
[4] summary.check_packages_in_dir* summary.connection summary.data.frame
[7] summary.Date summary.default summary.ecdf*
[10] summary.factor summary.glm summary.infl*
[13] summary.lm summary.loess* summary.manova
[16] summary.matrix summary.mlm* summary.nls*
[19] summary.packageStatus* summary.PDF_Dictionary* summary.PDF_Stream*
[22] summary.POSIXct summary.POSIXlt summary.ppr*
[25] summary.prcomp* summary.princomp* summary.proc_time
[28] summary.shingle* summary.srcfile summary.srcref
[31] summary.stepfun summary.stl* summary.table
[34] summary.trellis* summary.tukeysmooth* summary.yearmon*
[37] summary.yearqtr* summary.zoo*
As you can see there are no methods for the Arima
class (which is the class of your fit
object), so this is not what happens (i.e. you are not comparing your forecasts with your actual values). You are using summary.default
from the above list.
You can see this from the following too:
a <- arima(USAccDeaths, order = c(1,1,0))
identical(summary(a), summary(a, USAccDeaths[1:100]))
#[1] TRUE
There is no difference between summary(a)
and summary(a, USAccDeaths[1:100])
.
To compare using RMSE:
library(forecast)
fit <- arima(USAccDeaths[1:50], order = c(1,1,0))
preds <- as.vector(forecast(fit, h = 10)$mean)
RMSE <- sqrt(mean((preds - as.vector(USAccDeaths[51:60])) ^ 2))
RMSE
#[1] 2056.483
The closer it is to zero the better your model.