Search code examples
rgbm

gbm package and quantile regression


Could someone please point out the correct use of the quantile distribution option in the gbm package? This:

library(datasets)
library(gbm)
library(caret)

set.seed(42)
rm(list = ls())

model <- gbm(Petal.Width ~ Petal.Length

                        , distribution = list(name = "quantile", alpha = 0.4)
                        , data = iris
                        , n.trees = number_of_trees
                        , interaction.depth = 3
                        , shrinkage = 0.01,
                        , n.minobsinnode = 10
    )
model

Does not work. I get:

Error in if (!is.element(distribution$name, getAvailableDistributions())) { : 
  argument is of length zero
Error: object 'model' not found

Thanks!


Solution

  • This was a bug in gbm, as reported in these GitHub issues: #29, #27. It was fixed in this commit. Until they get the new version on CRAN, you can do quantile regression with the GitHub development version:

    devtools::install_github("gbm-developers/gbm")
    #> Downloading GitHub repo gbm-developers/gbm@master
    #> from URL https://api.github.com/repos/gbm-developers/gbm/zipball/master
    #> Installing gbm
    #> '/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore  \
    #>   --quiet CMD INSTALL  \
    #>   '/tmp/Rtmp4acgli/devtools55756447fca5/gbm-developers-gbm-0e07a6b'  \
    #>   --library='/home/duckmayr/R/x86_64-pc-linux-gnu-library/3.5' --install-tests
    #> 
    #> Reloading installed gbm
    #> Loaded gbm 2.1.4.9000
    library(datasets)
    library(gbm)
    # library(caret) # this package isn't used
    
    set.seed(42)
    rm(list = ls())
    
    model <- gbm(Petal.Width ~ Petal.Length
    
                 , distribution = list(name = "quantile", alpha = 0.4)
                 , data = iris
                 , n.trees = 3 # number_of_trees -- this variable isn't given by OP
                 , interaction.depth = 3
                 , shrinkage = 0.01,
                 , n.minobsinnode = 10
    )
    model
    #> gbm(formula = Petal.Width ~ Petal.Length, distribution = list(name = "quantile", 
    #>     alpha = 0.4), data = iris, n.trees = 3, interaction.depth = 3, 
    #>     n.minobsinnode = 10, shrinkage = 0.01)
    #> A gradient boosted model with quantile loss function.
    #> 3 iterations were performed.
    #> There were 1 predictors of which 1 had non-zero influence.
    

    but not the CRAN version:

    install.packages("gbm")
    #> Installing package into '/home/duckmayr/R/x86_64-pc-linux-gnu-library/3.5'
    #> (as 'lib' is unspecified)
    library(datasets)
    library(gbm)
    #> Loaded gbm 2.1.4
    # library(caret) # this package isn't used
    
    set.seed(42)
    rm(list = ls())
    
    model <- gbm(Petal.Width ~ Petal.Length
    
                 , distribution = list(name = "quantile", alpha = 0.4)
                 , data = iris
                 , n.trees = 3 # number_of_trees -- this variable isn't given by OP
                 , interaction.depth = 3
                 , shrinkage = 0.01,
                 , n.minobsinnode = 10
    )
    #> Error in if (!is.element(distribution$name, getAvailableDistributions())) {: argument is of length zero
    model
    #> Error in eval(expr, envir, enclos): object 'model' not found
    

    The issue was caused by this bit of code:

    distribution <- if (missing(distribution)) {
      if (missing(distribution)) {
        y <- data[, all.vars(formula)[1L], drop = TRUE]
        guessDist(y) 
      } else if (is.character(distribution)) { 
        distribution <- list(name = distribution) 
      } 
    }
    

    You'll notice they forgot at some point to handle the case where users pass a named list like the documentation said they could. But, now that bit of code is fixed:

    if (missing(distribution)) {
      y <- data[, all.vars(formula)[1L], drop = TRUE]
      distribution <- guessDist(y) 
    }
    
    if (is.character(distribution)) { 
      distribution <- list(name = distribution) 
    }
    

    That way if distribution is already a list, it is left undisturbed now.