r benchmarking resampling hyperparameters mlr

R - mlr - What is the difference between Benchmark and Resample when searching for hyperparameters

I'm searching for the optimum hyper parameters settings and i realise i can do that in both ways in MLR. benchmark function, and resample function. What is the difference between the two?

If i were to do it via benchmark, i can compare multiple models, and extract the tuned parameters which is an advantage over resample. Instead, if i were use the resample, i can only tune one model at a time, and I also notice my CPU skyrockets.

how and when should i use one over the other?

data(BostonHousing, package = "mlbench")

BostonHousing$chas <- as.integer(levels(BostonHousing$chas))[BostonHousing$chas]

library('mlr')
library('parallel')
library("parallelMap")

# ---- define learning tasks -------
regr.task = makeRegrTask(id = "bh", data = BostonHousing, target = "medv")

# ---- tune Hyperparameters -------- 

set.seed(1234)

# Define a search space for each learner'S parameter
ps_xgb = makeParamSet(
  makeIntegerParam("nrounds",lower=5,upper=50),
  makeIntegerParam("max_depth",lower=3,upper=15),
  # makeNumericParam("lambda",lower=0.55,upper=0.60),
  # makeNumericParam("gamma",lower=0,upper=5),
  makeNumericParam("eta", lower = 0.01, upper = 1),
  makeNumericParam("subsample", lower = 0, upper = 1),
  makeNumericParam("min_child_weight",lower=1,upper=10),
  makeNumericParam("colsample_bytree",lower = 0.1,upper = 1)
)

# Choose a resampling strategy
rdesc = makeResampleDesc("CV", iters = 5L)

# Choose a performance measure
meas = rmse

# Choose a tuning method
ctrl = makeTuneControlRandom(maxit = 30L)

# Make tuning wrappers
tuned.lm = makeLearner("regr.lm")
tuned.xgb = makeTuneWrapper(learner = "regr.xgboost", resampling = rdesc, measures = meas,
                           par.set = ps_xgb, control = ctrl, show.info = FALSE)

# -------- Benchmark experiements -----------
# Four learners to be compared
lrns = list(tuned.lm, tuned.xgb)

#setup Parallelization 
parallelStart(mode = "socket", #multicore #socket
              cpus = detectCores(),
              # level = "mlr.tuneParams",
              mc.set.seed = TRUE)

# Conduct the benchmark experiment
bmr = benchmark(learners = lrns, 
                tasks = regr.task,
                resamplings = rdesc,
                measures = rmse, 
                keep.extract = T,
                models = F,
                show.info = F)

parallelStop()

# ------ Extract HyperParameters -----
bmr_hp <- getBMRTuneResults(bmr)
bmr_hp$bh$regr.xgboost.tuned[[1]]


res <-
  resample(
    tuned.xgb,
    regr.task,
    resampling = rdesc,
    extract = getTuneResult, #getFeatSelResult, getTuneResult
    show.info = TRUE,
    measures = meas
  )

res$extract

Solution

Benchmarking and resampling are orthogonal concepts -- you can use both independently or together with each other.

Resampling makes sure that learned models are evaluated appropriately. In particular, we don't want to evaluate a learned model by giving it the same data we used for training it, because then the model could just memorize the data and seem like the perfect model. Instead, we evaluate it on different, held-out data to see whether it has learned the general concept and is able to make good predictions on unseen data as well. The resampling determines how this split into train and test data happens, how many iterations of different train and test splits are used, etc.

Benchmarking allows you to compare different learners on different tasks. It's a convenient way to run large-scale comparison experiments that you would otherwise have to perform manually by combining all learners and all tasks, training and evaluating models, and making sure that everything happens in exactly the same way. To determine the performance of a learner and the models it induces on a given task, a resampling strategy is used, as outlined above.

So in short, the answer to your question is to use resampling when you want to evaluate the performance of learned model, and benchmarking with resampling when you want to compare the performance of different learners on different tasks.