Search code examples
rmodeling

R package Ranger produces errors related to variable "none"


I am using the Ranger package in R to create a classification model of text. However, when one of the variables is "none", I get the following error.

Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) : 
  index larger than maximal 5002

I also noticed that the variable "none" was not included in the model even though it was in the dataset.

I created a reproducible example

library(dplyr)
library(Matrix)
library(ranger)

Dataset <- tibble(
                  Y = sample(c(1,0), 1000, replace = T),
                  none = sample(c(1,0), 1000, replace = T), 
                  None = sample(c(1,0), 1000, replace = T),
                  NONE = sample(c(1,0), 1000, replace = T))


trainset <- sample(1:1000, 800)

Dataset2Train <- Dataset[trainset,] %>%
  as.matrix()  %>%
  Matrix(., sparse = TRUE)

Dataset2Test <- Dataset[-trainset,] %>%
  as.matrix()  %>%
  Matrix(., sparse = TRUE)

#Creates model with no messages
rf <- ranger(data = Dataset2Train,
                              dependent.variable.name = "Y", 
                              classification = TRUE)

#Creates model with no messages
rf2 <- ranger(data =  Dataset[trainset,],
                              dependent.variable.name = "Y", 
                              classification = TRUE)

#produces error message
rf3 <- ranger(data =  Dataset[trainset,] %>% setNames(c("Y", "w", "x", "z")),
                              dependent.variable.name = "Y", 
                              classification = TRUE)

#does not include the none variable
rf$forest$independent.variable.names
rf2$forest$independent.variable.names

#crashes Rstudio
predict(rf, data = Dataset[-trainset,],  type = "response", predict.all = T)

#creates a prediction
predict(rf2, data = Dataset[-trainset,],  type = "response", predict.all = T)

The true dataset is properly sparse, and doesn't crash when I predict but returns the error message mentioned at the beginning of the question.

"none" is not a reserved word in R so why is this happening?


Solution

  • Fixed in ranger 0.10.6. Until the change is on CRAN, install via

    devtools::install_github("imbs-hl/ranger")