Search code examples
pythonmachine-learningscikit-learncross-validation

How to 'leave pair out' where each test pair consists of [0 1]


I am building a machine learning classifier and want to use 'leave pair out' cross-validation, with non-overlapping pairs, where each pair contains one instance from the negative class and one from the positive, i.e. the ground truth labels or y values in each test set fold would be [0 1].

I can't work out how to achieve this in Scikit learn. I have 50 instances (with 25 in each class) so I can do:

split = KFold(n_splits=50 // 2, shuffle=True, random_state=42)

to get non-overlapping pairs but this doesn't give me test sets of [0 1]. I have looked at the documentation for LeavePGroupsOut but this doesn't seem to be what I want.

Can anyone point me in the right direction? Thank you!


Solution

  • I believe you may want StratifiedKFold. This class creates folds while maintaining the label balance; combined with your current technique of using 25 folds, that should do the trick.