Search code examples
rxgboostmlr

Set hyperparameters to a learner in mlr after parameter tuning


I'm building a classification task in R using the mlr package, to tune the hyperparameters I'm using a validation set, and one of these parameters is the percentage of variables used based on importance using feature selection (chi.square method)

lrn = makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared")
params <- makeParamSet(
     makeDiscreteParam("booster",values = c("gbtree","dart")),
     makeDiscreteParam("nrounds", values = 1000, tunable = F),
     makeDiscreteParam("eta", values = c(0.1,0.05,0.2)),
     makeIntegerParam("max_depth",lower = 3L,upper = 10L),
     makeNumericParam("min_child_weight",lower = 1L,upper = 10L),
     makeNumericParam("subsample",lower = 0.5,upper = 1),
     makeNumericParam("colsample_bytree",lower = 0.5,upper = 1),
     makeDiscreteParam("fw.perc", values = seq(0.2, 1, 0.05)))
rdesc = makeResampleDesc("CV", iters = 5)
ctrl <- makeTuneControlRandom(maxit = 1L)
res = tuneParams(lrn, task = valTask2016, resampling = rdesc, par.set = params, control = ctrl)

I'm not sure if I need to do 5-fold cross validation in here, but the variable res gives me all the parameters I need, including the fw.perc which will prune my variable selection in order of descending importance.

My question is, how can I use these parameters to again use resampling (this time using Subsampling) but this time on the training data? This is what I got:

rdesc = makeResampleDesc("Subsample", iters = 5, split = 0.8)
lrn = setHyperPars(makeLearner("classif.xgboost"), par.vals = res$x)
r = resample(lrn, trainTask2016, rdesc, measures = list(mmce, fpr, fnr, timetrain))

In this case, valTask2016 is the classification task I used for validation of the parameters. I used createDummyFeatures to do one-hot encoding necessary for XGBoost.

And this is the error I got:

Error in setHyperPars2.Learner(learner, insert(par.vals, args)) : classif.xgboost: Setting parameter fw.perc without available description object! Did you mean one of these hyperparameters instead: booster eta alpha


Solution

  • I believe the reason why you get this error is that the second learner is a "simple" xgboost learner and not an xgboost learner wrapped by a filter, like your first learner (learnermakeFilterWrapper returns a learner).

    So, you have two options:

    1. You define a new paramSet in your second training, where you "copy" only the part of the res$x that refers to xgboost, i.e. without fw.perc
    2. You wrap your second xgboost learner by the same filter

    I hope this makes sense.

    EDIT: This worked for me for the second option using the titanic dataset:

    library(mlr)
    library(dplyr)
    library(titanic)
    
    sample <- sample.int(n = nrow(titanic_train), size = floor(.7*nrow(titanic_train)), replace = F)
    train <- titanic_train[sample, ] %>% select(Pclass, Sex, Age, SibSp, Fare, Survived) %>% mutate(Sex = ifelse(Sex == 'male', 0, 1))
    
    lrn = makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared")
    
    params <- makeParamSet(
      makeDiscreteParam("booster",values = c("gbtree","dart")),
      makeDiscreteParam("nrounds", values = 1000, tunable = F),
      makeDiscreteParam("eta", values = c(0.1,0.05,0.2)),
      makeIntegerParam("max_depth",lower = 3L,upper = 10L),
      makeNumericParam("min_child_weight",lower = 1L,upper = 10L),
      makeNumericParam("subsample",lower = 0.5,upper = 1),
      makeNumericParam("colsample_bytree",lower = 0.5,upper = 1),
      makeDiscreteParam("fw.perc", values = seq(0.2, 1, 0.05)))
    
    classif.task <- mlr::makeClassifTask(data = train,
                                     target = "Survived",
                                     positive = "1")
    
    rdesc = makeResampleDesc("CV", iters = 3)
    
    ctrl <- makeTuneControlRandom(maxit = 2L)
    
    res = tuneParams(lrn, task = classif.task, resampling = rdesc, par.set = params, control = ctrl)
    
    ##########################
    
    test <- titanic_train[-sample,] %>% select(Pclass, Sex, Age, SibSp, Fare, Survived) %>% mutate(Sex = ifelse(Sex == 'male', 0, 1))
    
    lrn2 = setHyperPars(makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared"), par.vals = res$x)
    
    classif.task2 <- mlr::makeClassifTask(data = test,
                                     target = "Survived",
                                     positive = "1")
    
    rdesc = makeResampleDesc("CV", iters = 3)
    r = resample(learner = lrn2, task = classif.task2, resampling = rdesc, show.info = T, models = TRUE)