Search code examples
scikit-learncross-validationk-fold

How can I get data after cross-validation?


I'm trying to make Image Classifier for 7 classes using transfer learning with Xception. and now I'm trying to implement cross-validation. I know KFold return indices but how can I get the data value.

from sklearn.model_selection import KFold
import numpy as np

sample = np.array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'])

kf = KFold(n_splits=3, shuffle=True)
for train_index, test_index in kf.split(sample):
    print("TRAIN:", train_index, "TEST:", test_index)

It return

TRAIN: [1 2 3 4 6 7] TEST: [0 5 8]
TRAIN: [0 1 2 4 5 8] TEST: [3 6 7]
TRAIN: [0 3 5 6 7 8] TEST: [1 2 4]

But what I want is

TRAIN: ['B', 'C', 'D', 'E', 'G', 'H'] TEST: ['A', 'F', 'I']
TRAIN: ['A', 'B', 'C', 'E', 'F', 'I'] TEST: ['D', 'G', 'H']
TRAIN: ['A', 'D', 'F', 'G', 'H', 'I'] TEST: ['B', 'C', 'E']

What should I do?


Solution

  • kf.split returns the indices, not the actual samples. You only need to change to:

    for train_index, test_index in kf.split(sample):
        print("TRAIN:", sample[train_index], "TEST:", sample[test_index])
    

    Result:

    TRAIN: ['A' 'B' 'C' 'E' 'F' 'H'] TEST: ['D' 'G' 'I']
    TRAIN: ['A' 'D' 'F' 'G' 'H' 'I'] TEST: ['B' 'C' 'E']
    TRAIN: ['B' 'C' 'D' 'E' 'G' 'I'] TEST: ['A' 'F' 'H']