I estimate a model using a classif.rpart learner. The estimation is embedded in a nested resampling. When I look at the inner tuning results using mlr3tuning::extract_inner_tuning_results(bmr), the values for minbucket and minsplit are decimal numbers (example: minbucket 0.13 or 2.81, minsplit 2.35 or 4.61). From my understanding, both indicate numbers of observations, so I thought it should be integers. Do you have an explanation for why these numbers are decimal? Thank you in advance!
Edit: I cannot post the original code I use, but this code shows the same behaviour, using a task from the mlr3 package.
library(mlr3)
library(progressr)
# choose task
sonar <- tsk("sonar")
# choose learners
l_rpart <- lrn("classif.rpart")
l_ranger <- lrn("classif.ranger")
# add search spaces to learners
l_rpart$param_set$values <- lts("classif.rpart.default")$values
l_ranger$param_set$values <- lts("classif.ranger.default")$values
# add fallback learners
l_rpart$fallback = lrn("classif.featureless")
l_ranger$fallback = lrn("classif.featureless")
# robustify
rpart_graph <- mlr3pipelines::pipeline_robustify(task = sonar, learner = l_rpart) %>>% mlr3pipelines::po("learner", l_rpart)
rpart_learner <- mlr3::as_learner(rpart_graph)
ranger_graph <- mlr3pipelines::pipeline_robustify(task = sonar, learner = l_ranger) %>>% mlr3pipelines::po("learner", l_ranger)
ranger_learner <- mlr3::as_learner(ranger_graph)
# create autotuners
at_rpart <- mlr3tuning::auto_tuner(
method = mlr3verse::tnr("random_search"),
learner = rpart_learner,
resampling = mlr3::rsmp("cv", folds = 4),
measure = mlr3::msr("classif.acc", id = "acc"),
term_time = 1 * 60,
term_evals = 4)
at_ranger <- mlr3tuning::auto_tuner(
method = mlr3verse::tnr("random_search"),
learner = ranger_learner,
resampling = mlr3::rsmp("cv", folds = 4),
measure = mlr3::msr("classif.acc", id = "acc"),
term_time = 1 * 60,
term_evals = 4)
# create the benchmark design
design = benchmark_grid(tasks = sonar,
learners = list(at_rpart, at_ranger),
resamplings = mlr3::rsmp("cv", folds = 3))
# run the benchmark experiment
bmr = with_progress(benchmark(design,
store_models = TRUE))
# show inner tuning results
mlr3tuning::extract_inner_tuning_results(bmr)
The beginning of the output looks like this, where you can see that classif.rpart.minsplit and classif.rpart.minbucket are decimals instead of integers as I would expect.:
mlr3tuning::extract_inner_tuning_results(bmr)
experiment iteration classif.rpart.minsplit classif.rpart.minbucket classif.rpart.cp classif.ranger.mtry.ratio classif.ranger.replace
1: 1 1 2.834898 2.9295168 -9.089721 NA NA
2: 1 2 4.515618 0.5116199 -3.805193 NA NA
3: 1 3 3.484092 2.6164599 -3.131506 NA NA
4: 2 1 NA NA NA 0.2700584 FALSE
5: 2 2 NA NA NA 0.1032228 TRUE
6: 2 3 NA NA NA 0.3427129 FALSE
Thank you again for looking into it.
minsplit
and minbucket
are tuned on the logarithmic scale in the default tuning space. The values you see in the archive are before the transformation e.g. 2.834898 becomes exp(2.834898) = 17.02866
. Since minsplit
is defined as an integer, the value is then rounded to 17 before the model is trained. See the tuning chapter in the mlr3book for more information.