I am aware of the question GBM: Object 'p' not found; however it did not contain sufficient information to allow the stack to answer. I don't believe this is a duplicate as I've followed what was indicated in this question and the linked duplicate Error in R gbm function when cv.folds > 0 which, does not describe the same error.
I have been sure to follow the recommendation of leaving out any columns that were not used in the model.
This error appears when the cv.folds
is greater than 0:
object 'p' not found
From what I can see, setting cv.folds
to 0 is not producing meaningful outputs.I have attempted different distributions, fractions, trees etc. I'm confident I've parameterized something incorrectly but I can't for the life of me see what it is.
Model and output:
model_output <- gbm(formula = ign ~ . ,
distribution = "bernoulli",
var.monotone = rep(0,9),
data = model_sample,
train.fraction = 0.50,
n.cores = 1,
n.trees = 150,
cv.folds = 1,
keep.data = T,
verbose=T)
Iter TrainDeviance ValidDeviance StepSize Improve
1 nan nan 0.1000 nan
2 nan nan 0.1000 nan
3 nan nan 0.1000 nan
4 nan nan 0.1000 nan
5 nan nan 0.1000 nan
6 nan nan 0.1000 nan
7 nan nan 0.1000 nan
8 nan nan 0.1000 nan
9 nan nan 0.1000 nan
10 nan nan 0.1000 nan
20 nan nan 0.1000 nan
40 nan nan 0.1000 nan
60 nan nan 0.1000 nan
80 nan nan 0.1000 nan
100 nan nan 0.1000 nan
120 nan nan 0.1000 nan
140 nan nan 0.1000 nan
150 nan nan 0.1000 nan
Minimum data to generate error used to be here, however once the suggest by @StupidWolf is employed it is too small, the suggestion below will get passed the initial error. Subsequent errors are occurring and solutions will be posted here upon discovery.
It's not meant to deal with the situation someone sets cv.folds = 1. By definition, k fold means splitting the data into k parts, training on 1 part and testing on the other.. So I am not so sure what is 1 -fold cross validation, and if you look at the code for gbm, at line 437
if(cv.folds > 1) {
cv.results <- gbmCrossVal(cv.folds = cv.folds, nTrain = nTrain,
....
p <- cv.results$predictions
}
It makes the predictions and when it collects the results into gbm, line 471:
if (cv.folds > 0) {
gbm.obj$cv.fitted <- p
}
So if cv.folds ==1, p is not calculated, but it is > 0 hence you get the error.
Below is a reproducible example:
library(MASS)
test = Pima.tr
test$type = as.numeric(test$type)-1
model_output <- gbm(type~ . ,
distribution = "bernoulli",
var.monotone = rep(0,7),
data = test,
train.fraction = 0.5,
n.cores = 1,
n.trees = 30,
cv.folds = 1,
keep.data = TRUE,
verbose=TRUE)
gives me the error object 'p' not found
Set it to cv.folds = 2, and it runs smoothly....
model_output <- gbm(type~ . ,
distribution = "bernoulli",
var.monotone = rep(0,7),
data = test,
train.fraction = 0.5,
n.cores = 1,
n.trees = 30,
cv.folds = 2,
keep.data = TRUE,
verbose=TRUE)