Suppose you want to evaluate a simple glm model to forecast an economic data series. Consider the following code:
library(caret)
library(ggplot2)
data(economics)
h <- 7
myTimeControl <- trainControl(method = "timeslice",
initialWindow = 24*h,
horizon = 12,
fixedWindow = TRUE)
fit.glm <- train(unemploy ~ pce + pop + psavert,
data = economics,
method = "glm",
preProc = c("center", "scale","BoxCox"),
trControl = myTimeControl)
Suppose that the covariates used into the train formula are predictions of values obtained by some other model. This simple model gives the following results:
Generalized Linear Model
574 samples
3 predictor
Pre-processing: centered (3), scaled (3), Box-Cox transformation (3)
Resampling: Rolling Forecasting Origin Resampling (12 held-out with a fixed
window)
Summary of sample sizes: 168, 168, 168, 168, 168, 168, ...
Resampling results:
RMSE Rsquared
1446.335 0.2958317
Apart from the bad results obtained (this is only an example). I wonder if it is correct:
if I show fit.glm summary I obtain:
Call:
NULL
Deviance Residuals:
Min 1Q Median 3Q Max
-5090.0 -1025.5 -208.1 833.4 4948.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7771.56 64.93 119.688 < 2e-16 ***
pce 5750.27 1153.03 4.987 8.15e-07 ***
pop -1483.01 1117.06 -1.328 0.185
psavert 2932.38 144.56 20.286 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 2420081)
Null deviance: 3999514594 on 573 degrees of freedom
Residual deviance: 1379446256 on 570 degrees of freedom
AIC: 10072
Number of Fisher Scoring iterations: 2
The parameters showed refer to the last trained GLM or are "average" paramters? I hope I've been clear enough.
This resampling method is like any others. The RMSE is estimated using different subsets of the training data. Note that it says "Summary of sample sizes: 168, 168, 168, 168, 168, 168, ...
". The final model uses all of the training data set.
The difference between Rob's results and these are primarily due to the difference between Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)