Benchmarking multiple AutoTuning instances

I have been trying to use mlr3 to do some hyperparameter tuning for xgboost. I want to compare three different models:

xgboost tuned over just the alpha hyperparameter
xgboost tuned over alpha and lambda hyperparameters
xgboost tuned over alpha, lambda, and maxdepth hyperparameters.

After reading the mlr3 book, I thought that using AutoTuner for the nested resampling and benchmarking would be the best way to go about doing this. Here is what I have tried:

task_mpcr <- TaskRegr$new(id = "mpcr", backend = data.numeric, target = "n_reads")

measure <- msr("poisson_loss")

xgb_learn <- lrn("regr.xgboost")

set.seed(103)
fivefold.cv = rsmp("cv", folds = 5)

param.list <- list(  alpha = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
                 lambda = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
                 max_depth = p_int(lower = 2, upper = 10)
)


model.list <- list()
for(model.i in 1:length(param.list)){

  param.list.subset <- param.list[1:model.i]
  search_space <- do.call(ps, param.list.subset)

  model.list[[model.i]] <- AutoTuner$new(
    learner = xgb_learn,
    resampling = fivefold.cv,
    measure = measure,
    search_space = search_space,
    terminator = trm("none"),
    tuner = tnr("grid_search", resolution = 10),
    store_tuning_instance = TRUE
  )
}
grid <- benchmark_grid(
task = task_mpcr,
learner = model.list,
resampling = rsmp("cv", folds =3)
)

bmr <- benchmark(grid, store_models = TRUE)

Note that I added Poisson loss as a measure for the count data I am working with. For some reason after running the benchmark function, the Poisson loss of all my models is nearly identical per fold, making me think that no tuning was done.

I also cannot find a way to access the hyperparameters used to get the lowest loss per train/test iteration. Am I misusing the benchmark function entirely? Also, this is my first question on SO, so any formatting advice would be appreciated!

Solution

To see whether tuning has an effect, you can just add an untuned learner to the benchmark. Otherwise, the conclusion could be that tuning alpha is sufficient for your example.

I adapted the code so that it runs with an example task.

library(mlr3verse)

task <- tsk("mtcars")

measure <- msr("regr.rmse")

xgb_learn <- lrn("regr.xgboost")

param.list <- list(
  alpha = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
  lambda = p_dbl(lower = 0.001, upper = 100, logscale = TRUE)
)

model.list <- list()
for(model.i in 1:length(param.list)){
  
  param.list.subset <- param.list[1:model.i]
  search_space <- do.call(ps, param.list.subset)
  
  at <- AutoTuner$new(
    learner = xgb_learn,
    resampling = rsmp("cv", folds = 5),
    measure = measure,
    search_space = search_space,
    terminator = trm("none"),
    tuner = tnr("grid_search", resolution = 5),
    store_tuning_instance = TRUE
  )
  at$id = paste0(at$id, model.i)
  
  model.list[[model.i]] <- at
}

model.list <- c(model.list, list(xgb_learn)) # add baseline learner

grid <- benchmark_grid(
  task = task,
  learner = model.list,
  resampling = rsmp("cv", folds =3)
)

bmr <- benchmark(grid, store_models = TRUE)

autoplot(bmr)

bmr_data = bmr$data$as_data_table() # convert benchmark result to a handy data.table
bmr_data$learner[[1]]$learner$param_set$values # the final learner used by AutoTune is nested in $learner

# best found value during grid search
bmr_data$learner[[1]]$archive$best()

# transformed value (the one that is used for the learner)
bmr_data$learner[[1]]$archive$best()$x_domain

In the last lines you see how one can access the individual runs of the benchmark. Im my example we have 9 runs resulting for 3 learners and 3 outer resampling folds.