I use StratifiedKFold and a form of grid search for my Logistic Regression.
skf = StratifiedKFold(n_splits=6, shuffle=True, random_state=SEED)
I call this for loop for each combination of parameters:
for fold, (trn_idx, test_idx) in enumerate(skf.split(X, y)):
My question is, are trn_idx
and test_idx
the same for each fold every time I run the loop?
For example, if fold0
contains trn_dx = [1,2,5,7,8]
and test_idx = [3,4,6]
, is fold0
going to contain the same trn_idx
and test_idx
the next 5 times I run the loop?
Yes, the stratified k-fold split is fixed if random_state=SEED
is fixed. The shuffle
only shuffles the dataset along with their targets before the k-fold split.
This means that each fold will always have their indexes:
x = list(range(10))
y = [1]*5 + [2]*5
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
for fold, (trn_idx, test_idx) in enumerate(skf.split(x, y)):
print(trn_idx, test_idx)
Output:
[1 2 4 5 7 9] [0 3 6 8]
[0 1 3 5 6 8 9] [2 4 7]
[0 2 3 4 6 7 8] [1 5 9]
No matter how may times I run this code.