Search code examples
pythonmachine-learningdata-sciencecross-validationknn

Cross Validation With Repetition


I have a question regarding the cross validation used in ML problems.If we apply 5-folds cross validation for a dataset say for example 2 times, one in Monday and one in Friday, just two separate times. Does the elements that exist in a particular fold in Monday are the same elements that would exist in the same fold in Friday?

Does this explain why this code:

from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score

iris=load_iris()

X=iris.data
y=iris.target
model=KNeighborsClassifier(n_neighbors=5)

cvs=cross_val_score(model, X, y, cv=5)
print(cvs)

gives always the same results in every execution:

[0.96666667 1.         0.93333333 0.96666667 1.        ]

Solution

  • As you can read in the documentation of cross_val_score, under the hood it performs a Stratified K-Folds cross validation which does not shuffle your data (X, y). Therefore, each time you calculate the cross_val_score you train the same model on the same folds and validate on the same fold and therefore obtain the same result.