Search code examples

Why are tuned minsplit & minbucket from rpart decimal numbers?

I estimate a model using a classif.rpart learner. The estimation is embedded in a nested resampling. When I look at the inner tuning results using mlr3tuning::extract_inner_tuning_results(bmr), the values for minbucket and minsplit are decimal numbers (example: minbucket 0.13 or 2.81, minsplit 2.35 or 4.61). From my understanding, both indicate numbers of observations, so I thought it should be integers. Do you have an explanation for why these numbers are decimal? Thank you in advance!

Edit: I cannot post the original code I use, but this code shows the same behaviour, using a task from the mlr3 package.


# choose task
sonar <- tsk("sonar")

# choose learners
l_rpart <- lrn("classif.rpart")
l_ranger <- lrn("classif.ranger")

# add search spaces to learners
l_rpart$param_set$values <- lts("classif.rpart.default")$values
l_ranger$param_set$values <- lts("classif.ranger.default")$values

# add fallback learners
l_rpart$fallback = lrn("classif.featureless")
l_ranger$fallback = lrn("classif.featureless")

# robustify
rpart_graph <- mlr3pipelines::pipeline_robustify(task = sonar, learner = l_rpart) %>>% mlr3pipelines::po("learner", l_rpart)
rpart_learner <- mlr3::as_learner(rpart_graph)

ranger_graph <- mlr3pipelines::pipeline_robustify(task = sonar, learner = l_ranger) %>>% mlr3pipelines::po("learner", l_ranger)
ranger_learner <- mlr3::as_learner(ranger_graph)

# create autotuners
at_rpart <- mlr3tuning::auto_tuner(
  method = mlr3verse::tnr("random_search"),
  learner = rpart_learner,
  resampling = mlr3::rsmp("cv", folds = 4),
  measure = mlr3::msr("classif.acc", id = "acc"),
  term_time =  1 * 60,
  term_evals = 4)

at_ranger <- mlr3tuning::auto_tuner(
  method = mlr3verse::tnr("random_search"),
  learner = ranger_learner,
  resampling = mlr3::rsmp("cv", folds = 4),
  measure = mlr3::msr("classif.acc", id = "acc"),
  term_time =  1 * 60,
  term_evals = 4)

# create the benchmark design
design = benchmark_grid(tasks = sonar,
                        learners = list(at_rpart, at_ranger),
                        resamplings = mlr3::rsmp("cv", folds = 3))

# run the benchmark experiment
bmr = with_progress(benchmark(design, 
                              store_models = TRUE))

# show inner tuning results

The beginning of the output looks like this, where you can see that classif.rpart.minsplit and classif.rpart.minbucket are decimals instead of integers as I would expect.:

   experiment iteration classif.rpart.minsplit classif.rpart.minbucket classif.rpart.cp classif.ranger.mtry.ratio classif.ranger.replace
1:          1         1               2.834898               2.9295168        -9.089721                        NA                     NA
2:          1         2               4.515618               0.5116199        -3.805193                        NA                     NA
3:          1         3               3.484092               2.6164599        -3.131506                        NA                     NA
4:          2         1                     NA                      NA               NA                 0.2700584                  FALSE
5:          2         2                     NA                      NA               NA                 0.1032228                   TRUE
6:          2         3                     NA                      NA               NA                 0.3427129                  FALSE

Thank you again for looking into it.


  • minsplit and minbucket are tuned on the logarithmic scale in the default tuning space. The values you see in the archive are before the transformation e.g. 2.834898 becomes exp(2.834898) = 17.02866. Since minsplit is defined as an integer, the value is then rounded to 17 before the model is trained. See the tuning chapter in the mlr3book for more information.