When running the gbm function for a classification problem. I get the following error:
Error in res[flag, ] <- predictions : replacement has length zero
I would like to know why I get this error and how to solve it.
My data is about 77 numeric variables(intergers) to be used in the classification and the 1 grouping factor. No other variables are in the data. There is no missing data in the data. The grouping factor is coded as a factor (0,1) as required.
The structure of my data looks something like this:
$Group : Factor w/ 2 levels "0", "1"
$it1 : int
...
$it70 : int
my model looks like this:
mod_gbm <- gbm(Group~. distribution = "bernoulli", data=df,
n.trees=1000,shrinkage=.01, n.minobsinnode=5,
interaction.depth = 6, cv.folds=5)
I realize this question is very similar to the one here: Problems in using GBM function to do classification in R but that person was wondering about using a numeric variable and the only response was to remove cv.folds. I would like to keep cv.folds in my model and to have it run.
If you check out the vignette of gbm
:
distribution: Either a character string specifying the name of the
distribution to use or a list with a component ‘name’
specifying the distribution and any additional parameters
needed. If not specified, ‘gbm’ will try to guess: if the
response has only 2 unique values, bernoulli is assumed;
otherwise, if the response is a factor, multinomial is
assumed
If you only have two classes, you don't need to convert it into a factor. We can explore this with iris example, where I create a group label 0/1 :
library(gbm)
df = iris
df$Group = factor(as.numeric(df$Species=="versicolor"))
df$Species = NULL
mod_gbm <- gbm(Group~.,distribution ="bernoulli", data=df,cv.folds=5)
Error in res[flag, ] <- predictions : replacement has length zero
I get the same error. So we convert it to numeric 0/1 and you can see it works correctly.
When the variable is a factor, doing as.numeric()
converts it to 1,2 with 1 corresponding to the first level. So this case, since Group is 0/1 to start with:
df$Group = as.numeric(df$Group)-1
mod_gbm <- gbm(Group~.,distribution ="bernoulli", data=df,cv.folds=5)
And we get the predictions:
pred = ifelse(predict(mod_gbm,type="response")>0.5,1,0)
table(pred,df$Group)
pred 0 1
0 98 3
1 2 47