Search code examples
rgbm

Object p not found when running gbm()


I am aware of the question GBM: Object 'p' not found; however it did not contain sufficient information to allow the stack to answer. I don't believe this is a duplicate as I've followed what was indicated in this question and the linked duplicate Error in R gbm function when cv.folds > 0 which, does not describe the same error.

I have been sure to follow the recommendation of leaving out any columns that were not used in the model.

This error appears when the cv.folds is greater than 0: object 'p' not found

From what I can see, setting cv.folds to 0 is not producing meaningful outputs.I have attempted different distributions, fractions, trees etc. I'm confident I've parameterized something incorrectly but I can't for the life of me see what it is.

Model and output:

model_output <- gbm(formula = ign ~ . , 
                  distribution = "bernoulli",
                  var.monotone = rep(0,9),
                  data = model_sample,
                  train.fraction = 0.50,
                  n.cores = 1,
                  n.trees = 150,
                  cv.folds = 1,
                  keep.data = T,
                  verbose=T)
Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1           nan             nan     0.1000       nan
     2           nan             nan     0.1000       nan
     3           nan             nan     0.1000       nan
     4           nan             nan     0.1000       nan
     5           nan             nan     0.1000       nan
     6           nan             nan     0.1000       nan
     7           nan             nan     0.1000       nan
     8           nan             nan     0.1000       nan
     9           nan             nan     0.1000       nan
    10           nan             nan     0.1000       nan
    20           nan             nan     0.1000       nan
    40           nan             nan     0.1000       nan
    60           nan             nan     0.1000       nan
    80           nan             nan     0.1000       nan
   100           nan             nan     0.1000       nan
   120           nan             nan     0.1000       nan
   140           nan             nan     0.1000       nan
   150           nan             nan     0.1000       nan

Minimum data to generate error used to be here, however once the suggest by @StupidWolf is employed it is too small, the suggestion below will get passed the initial error. Subsequent errors are occurring and solutions will be posted here upon discovery.


Solution

  • It's not meant to deal with the situation someone sets cv.folds = 1. By definition, k fold means splitting the data into k parts, training on 1 part and testing on the other.. So I am not so sure what is 1 -fold cross validation, and if you look at the code for gbm, at line 437

      if(cv.folds > 1) {
        cv.results <- gbmCrossVal(cv.folds = cv.folds, nTrain = nTrain,
        ....
        p <- cv.results$predictions
    }
    

    It makes the predictions and when it collects the results into gbm, line 471:

      if (cv.folds > 0) { 
        gbm.obj$cv.fitted <- p 
      }
    

    So if cv.folds ==1, p is not calculated, but it is > 0 hence you get the error.

    Below is a reproducible example:

    library(MASS)
    test = Pima.tr 
    test$type = as.numeric(test$type)-1
    
    model_output <- gbm(type~ . , 
                      distribution = "bernoulli",
                      var.monotone = rep(0,7),
                      data = test,
                      train.fraction = 0.5,
                      n.cores = 1,
                      n.trees = 30,
                      cv.folds = 1,
                      keep.data = TRUE,
                      verbose=TRUE)
    

    gives me the error object 'p' not found

    Set it to cv.folds = 2, and it runs smoothly....

    model_output <- gbm(type~ . , 
                      distribution = "bernoulli",
                      var.monotone = rep(0,7),
                      data = test,
                      train.fraction = 0.5,
                      n.cores = 1,
                      n.trees = 30,
                      cv.folds = 2,
                      keep.data = TRUE,
                      verbose=TRUE)