Search code examples
benchmarkingmlr3

Using pre-defined train and test sets in a benchmark in mlr3


I would like to compare several machine learning algorithms in a classification task using the benchmark_grid() function in mlr3. According to https://mlr3book.mlr-org.com/benchmarking.html benchmark_grid() takes a resampling scheme to partition the date in the task into training and test data. However, I would like to use a manual partitioning. How can I specify training and test set manually when using benchmark_grid()?

EDIT: Code example based on the suggestion by pat-s

# use benchmark() from mlr3 to compare different classification models on the iris data set using a manually
# pre-defined partitioning into training and test data sets (hold-out sampling)

library("mlr3verse")

# Instantiate Task
task = tsk("iris")

# Instantiate Custom Resampling

# hold-out sample with pre-defined partitioning into train and test set
custom = rsmp("custom")
train_sets = list(1:120)
test_sets = list(121:150)
custom$instantiate(task, train_sets, test_sets)


design = benchmark_grid(
  tasks = task,
  learners = lrns(c("classif.ranger", "classif.rpart", "classif.featureless"),
    predict_type = "prob", predict_sets = c("train", "test")),
  resamplings = custom
)

print(design)


# execute the benchmark
bmr = benchmark(design)

measure = msr("classif.acc")

tab = bmr$aggregate(measure)
print(tab)

Solution

  • You can use the "custom_cv" resampling scheme.