I am using the Ranger package in R to create a classification model of text. However, when one of the variables is "none", I get the following error.
Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) :
index larger than maximal 5002
I also noticed that the variable "none" was not included in the model even though it was in the dataset.
I created a reproducible example
library(dplyr)
library(Matrix)
library(ranger)
Dataset <- tibble(
Y = sample(c(1,0), 1000, replace = T),
none = sample(c(1,0), 1000, replace = T),
None = sample(c(1,0), 1000, replace = T),
NONE = sample(c(1,0), 1000, replace = T))
trainset <- sample(1:1000, 800)
Dataset2Train <- Dataset[trainset,] %>%
as.matrix() %>%
Matrix(., sparse = TRUE)
Dataset2Test <- Dataset[-trainset,] %>%
as.matrix() %>%
Matrix(., sparse = TRUE)
#Creates model with no messages
rf <- ranger(data = Dataset2Train,
dependent.variable.name = "Y",
classification = TRUE)
#Creates model with no messages
rf2 <- ranger(data = Dataset[trainset,],
dependent.variable.name = "Y",
classification = TRUE)
#produces error message
rf3 <- ranger(data = Dataset[trainset,] %>% setNames(c("Y", "w", "x", "z")),
dependent.variable.name = "Y",
classification = TRUE)
#does not include the none variable
rf$forest$independent.variable.names
rf2$forest$independent.variable.names
#crashes Rstudio
predict(rf, data = Dataset[-trainset,], type = "response", predict.all = T)
#creates a prediction
predict(rf2, data = Dataset[-trainset,], type = "response", predict.all = T)
The true dataset is properly sparse, and doesn't crash when I predict but returns the error message mentioned at the beginning of the question.
"none" is not a reserved word in R so why is this happening?
Fixed in ranger 0.10.6. Until the change is on CRAN, install via
devtools::install_github("imbs-hl/ranger")