Search code examples
rbenchmarkingmlr3

Benchmarking multiple AutoTuning instances


I have been trying to use mlr3 to do some hyperparameter tuning for xgboost. I want to compare three different models:

  1. xgboost tuned over just the alpha hyperparameter
  2. xgboost tuned over alpha and lambda hyperparameters
  3. xgboost tuned over alpha, lambda, and maxdepth hyperparameters.

After reading the mlr3 book, I thought that using AutoTuner for the nested resampling and benchmarking would be the best way to go about doing this. Here is what I have tried:

task_mpcr <- TaskRegr$new(id = "mpcr", backend = data.numeric, target = "n_reads")

measure <- msr("poisson_loss")

xgb_learn <- lrn("regr.xgboost")

set.seed(103)
fivefold.cv = rsmp("cv", folds = 5)

param.list <- list(  alpha = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
                 lambda = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
                 max_depth = p_int(lower = 2, upper = 10)
)


model.list <- list()
for(model.i in 1:length(param.list)){

  param.list.subset <- param.list[1:model.i]
  search_space <- do.call(ps, param.list.subset)

  model.list[[model.i]] <- AutoTuner$new(
    learner = xgb_learn,
    resampling = fivefold.cv,
    measure = measure,
    search_space = search_space,
    terminator = trm("none"),
    tuner = tnr("grid_search", resolution = 10),
    store_tuning_instance = TRUE
  )
}
grid <- benchmark_grid(
task = task_mpcr,
learner = model.list,
resampling = rsmp("cv", folds =3)
)

bmr <- benchmark(grid, store_models = TRUE)

Note that I added Poisson loss as a measure for the count data I am working with. For some reason after running the benchmark function, the Poisson loss of all my models is nearly identical per fold, making me think that no tuning was done.

I also cannot find a way to access the hyperparameters used to get the lowest loss per train/test iteration. Am I misusing the benchmark function entirely? Also, this is my first question on SO, so any formatting advice would be appreciated!


Solution

  • To see whether tuning has an effect, you can just add an untuned learner to the benchmark. Otherwise, the conclusion could be that tuning alpha is sufficient for your example.

    I adapted the code so that it runs with an example task.

    library(mlr3verse)
    
    task <- tsk("mtcars")
    
    measure <- msr("regr.rmse")
    
    xgb_learn <- lrn("regr.xgboost")
    
    param.list <- list(
      alpha = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
      lambda = p_dbl(lower = 0.001, upper = 100, logscale = TRUE)
    )
    
    model.list <- list()
    for(model.i in 1:length(param.list)){
      
      param.list.subset <- param.list[1:model.i]
      search_space <- do.call(ps, param.list.subset)
      
      at <- AutoTuner$new(
        learner = xgb_learn,
        resampling = rsmp("cv", folds = 5),
        measure = measure,
        search_space = search_space,
        terminator = trm("none"),
        tuner = tnr("grid_search", resolution = 5),
        store_tuning_instance = TRUE
      )
      at$id = paste0(at$id, model.i)
      
      model.list[[model.i]] <- at
    }
    
    model.list <- c(model.list, list(xgb_learn)) # add baseline learner
    
    grid <- benchmark_grid(
      task = task,
      learner = model.list,
      resampling = rsmp("cv", folds =3)
    )
    
    bmr <- benchmark(grid, store_models = TRUE)
    
    autoplot(bmr)
    
    bmr_data = bmr$data$as_data_table() # convert benchmark result to a handy data.table
    bmr_data$learner[[1]]$learner$param_set$values # the final learner used by AutoTune is nested in $learner
    
    # best found value during grid search
    bmr_data$learner[[1]]$archive$best()
    
    # transformed value (the one that is used for the learner)
    bmr_data$learner[[1]]$archive$best()$x_domain
    

    In the last lines you see how one can access the individual runs of the benchmark. Im my example we have 9 runs resulting for 3 learners and 3 outer resampling folds.