In order to make small tests on a large machine learning classification task in mlr, I would like to create small tasks first that maintain the positive/negative ratio of the original task.
Currently I am doing this manually using the function subsetTask
setting the argument subset
to a fixed index vector that preserves the class ratio.
Is there any way to do this internally? something like "Take 75% of this task, preserving the class ratio". Maybe using a resampling instance?
Thanks!
The function downsample(my_task, perc=0.05, stratify=TRUE)
should be what you're looking for:
https://mlr.mlr-org.com/reference/downsample.html
Setting the argument stratify
to TRUE (it defaults to FALSE) keeps the class ratios of the original data.
Does that help?