Search code examples
rmlr3

How to apply pipeline_smote just on training set in mlr3pipelines?


I am working on an imbalanced dataset with a two-class response variable using mlr3. I want to apply SMOTE method to oversample the minority. I learned that this method should be used only on the training set, not on the test set. However, if I do not misunderstand, the mlr3 pipeline manipulates the whole dataset before setting a task during which the dataset is splitted into the training and test sets. I wonder how to apply the SMOTE method (mlr_pipeops_smote) only on the training set?


Solution

  • It is automatically only applied on the training set; see the documentation:

    The output during prediction is the unchanged input.