Search code examples
rmachine-learningr-caretglm

How to extract RMSE from models built using caret?


I have built a glm model using R package "caret" and I'd like to assess its performance using RMSE. I notice that the two RMSEs are different and I wonder which one is the real RMSE?

Also, how can I extract each fold (5*5=25 in total) of the training data, test data, and predicted data (based on the optimal tuned parameter) from the model?

library(caret)
data("mtcars")
set.seed(100)
mydata = mtcars[, -c(8,9)]

model_glm <- train(
  hp ~ ., 
  data = mydata, 
  method = "glm", 
  metric = "RMSE", 
  preProcess = c('center', 'scale'), 
  trControl = trainControl(
    method = "repeatedcv", 
    number = 5, 
    repeats = 5, 
    verboseIter = TRUE
  )
)

GLM.pred = predict(model_glm, subset(mydata, select = -hp))
RMSE(pred = GLM.pred, obs = mydata$hp)      # 21.89
model_glm$results$RMSE                      # 32.16

Solution

  • With the following code, I get :

    sqrt(mean((mydata$hp - predict(model_glm)) ^ 2))
    [1] 21.89127
    

    This suggests that the real is "RMSE(pred = GLM.pred, obs = mydata$hp)"

    Also, you have

    model_glm$resample$RMSE
     [1] 28.30254 34.69966 25.55273 25.29981 40.78493 31.91056 25.05311 41.83223 26.68105 23.64629 27.98388 25.98827 45.26982 37.28214
    [15] 38.13617 31.14513 23.35353 42.05274 34.04761 35.17733 28.28838 35.89639 21.42580 45.17860 29.13998
    

    which is the RMSE for each of the 25 CV. Also, we have

    mean(model_glm$resample$RMSE)
    32.16515
    

    So, the 32.16 is the average of the RMSE of the 25 CV. The 21.89 is the RMSE on the original dataset.