I've been trying to stack together predictions from 2 regression models (glmnet and bagEarth) but I have been getting the "Error in FUN(X[[i]], ...) : { .... is not TRUE" message. Based on what I've read,I've seen this issue stem from resampling indexes, but since I am training the models together, I can't see how I can get the issue. I've been able to replicate using random numbers:
library(caret)
library(caretEnsemble)
rm(list=ls())
training <- as.data.frame(cbind(runif(24,1,100)
,runif(24,1,100)
,runif(24,1,100)
,runif(24,1,100)
,runif(24,1,100)
,runif(24,1,100)))
colnames(training) <- c("y", "x1", "x2", "x3", "x4", "x5")
set.seed(7)
ctrl <- trainControl(method = "cv", number = 3, returnResamp = "all", classProbs = FALSE, index = createMultiFolds(training$y, k = 3, times = 1))
model_list <- caretList(y~., data = training, trControl = ctrl, metric = "RMSE", methodList = c("glmnet", "bagEarth"))
train_ctrl <- trainControl(method = "cv", number = 3, classProbs = FALSE, savePredictions = TRUE, index = createMultiFolds(training$y, k = 3, times = 1))
glm_ensemble <- caretStack(model_list, method = "glm", metric = "RMSE", trControl = train_ctrl)
I know I am probably missing a key element somewhere, any input is appreciated.
Thanks, Anton
A bit of debugging and the error comes from a function called bestPreds
. This is a not exported function and looks in the model_lists for the saved predictions (all or final) in the control object. This you have not set in your control object. If you add this, everything will run fine. I do admit that an error message would be nice in this place instead of just throwing an error.
ctrl <- trainControl(method = "cv", number = 3, returnResamp = "all",
savePredictions = "final", # needs to be final or all
classProbs = FALSE, index = createMultiFolds(training$y, k = 3, times = 1))