Search code examples
mlr

Is it possible to subset a classification task in mlr keeping the positive/negative class ratio unchanged?


In order to make small tests on a large machine learning classification task in mlr, I would like to create small tasks first that maintain the positive/negative ratio of the original task.

Currently I am doing this manually using the function subsetTask setting the argument subset to a fixed index vector that preserves the class ratio.

Is there any way to do this internally? something like "Take 75% of this task, preserving the class ratio". Maybe using a resampling instance?

Thanks!


Solution

  • The function downsample(my_task, perc=0.05, stratify=TRUE) should be what you're looking for:

    https://mlr.mlr-org.com/reference/downsample.html

    Setting the argument stratify to TRUE (it defaults to FALSE) keeps the class ratios of the original data.

    Does that help?