I have a question regarding the cross validation used in ML problems.If we apply 5-folds cross validation for a dataset say for example 2 times, one in Monday and one in Friday, just two separate times. Does the elements that exist in a particular fold in Monday are the same elements that would exist in the same fold in Friday?
Does this explain why this code:
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
iris=load_iris()
X=iris.data
y=iris.target
model=KNeighborsClassifier(n_neighbors=5)
cvs=cross_val_score(model, X, y, cv=5)
print(cvs)
gives always the same results in every execution:
[0.96666667 1. 0.93333333 0.96666667 1. ]
As you can read in the documentation of cross_val_score, under the hood it performs a Stratified K-Folds cross validation which does not shuffle your data (X, y). Therefore, each time you calculate the cross_val_score
you train the same model on the same folds and validate on the same fold and therefore obtain the same result.