Search code examples
rr-carettraining-datagam

GAM method without resampling in caret produces stop error


I wrote a function within lapply to fit a GAM (with splines) for each element in a vector of response variables within a data frame. I opted to use caret to fit the models instead of directly using mgcv or the gam package because I would like to eventually split my data into a train/test set for validation and use various resampling techniques. For now, I simply have the trainControl method set to 'none' like so:

  # Set resampling method
  # tc <- trainControl(method = "boot", number = 100)
  # tc <- trainControl(method = "repeatedcv", number = 10, repeats = 1)
  tc <- trainControl(method = "none")

  fm <- lapply(group, function(x) {
  printFormula <- paste(x, "~", inf.factors)
  inputFormula <- as.formula(printFormula)
  # Partition input data for model training and testing
  # dpart <- createDataPartition(mdata[,x], times = 1, p = 0.7, list = FALSE)
  # train <- mdata[ data.partition, ]
  # test <- mdata[ -data.partition, ]
  
  cat("Fitting:", printFormula, "\n")
  # gam(inputFormula, family = binomial(link = "logit"), data = mdata)
  train(inputFormula, family = binomial(link = "logit"), data = mdata, method = "gam",
        trControl = tc)
})

When I execute this code, I receive the following error:

Error in train.default(x, y, weights = w, ...) : 
  Only one model should be specified in tuneGrid with no resampling

If I re-run the code in debugging mode, I can find where caret stops the training process:

if (trControl$method == "none" && nrow(tuneGrid) != 1) 
    stop("Only one model should be specified in tuneGrid with no resampling")

Clearly the train function fails because of the second condition, but when I look up the tuning parameters for a GAM (with splines) there is only an option for feature selection (not interested, I want to keep all the predictors in the model) and the method. Consequently, I do not include a tuneGrid data frame when I call train. Is this the reason why the model is failing in this way? What parameter would I provide and what would the tuneGrid look like?

I should add that the model is trained successfully when I use bootstrapping or k-fold CV, however these resampling methods take much longer to calculate and I do not need to use them yet.

Any help on this issue would be appreciated!


Solution

  • For that model, the tuning grid looks over two values of the select parameters:

    > getModelInfo("gam", regex = FALSE)[[1]]$grid
    function(x, y, len = NULL, search = "grid") {
       if(search == "grid") {
          out <- expand.grid(select = c(TRUE, FALSE), method = "GCV.Cp")
       } else {
          out <- data.frame(select = sample(c(TRUE, FALSE), size = len, replace = TRUE),
                             method = sample(c("GCV.Cp", "ML"), size = len, replace = TRUE))
       }
        out[!duplicated(out),]
     }
    

    You should use something like tuneGrid = data.frame(select = FALSE, method = "GCV.Cp") to only evaluate a single model (as error message says).