I would like to compare several machine learning algorithms in a classification task using the benchmark_grid() function in mlr3. According to https://mlr3book.mlr-org.com/benchmarking.html benchmark_grid() takes a resampling scheme to partition the date in the task into training and test data. However, I would like to use a manual partitioning. How can I specify training and test set manually when using benchmark_grid()?
EDIT: Code example based on the suggestion by pat-s
# use benchmark() from mlr3 to compare different classification models on the iris data set using a manually
# pre-defined partitioning into training and test data sets (hold-out sampling)
library("mlr3verse")
# Instantiate Task
task = tsk("iris")
# Instantiate Custom Resampling
# hold-out sample with pre-defined partitioning into train and test set
custom = rsmp("custom")
train_sets = list(1:120)
test_sets = list(121:150)
custom$instantiate(task, train_sets, test_sets)
design = benchmark_grid(
tasks = task,
learners = lrns(c("classif.ranger", "classif.rpart", "classif.featureless"),
predict_type = "prob", predict_sets = c("train", "test")),
resamplings = custom
)
print(design)
# execute the benchmark
bmr = benchmark(design)
measure = msr("classif.acc")
tab = bmr$aggregate(measure)
print(tab)
You can use the "custom_cv"
resampling scheme.