Why does my sklearn.model_selection.train_test_split()
returns same samples of X_train
, X_test
, y_train
, y_test
each time I run the code, even though I have kept shuffle=True
, and I have not manually defined the seed value?
I am printing the samples like this:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 100, shuffle=True)
print (y_test)
The train_test_split
random_state
controls the state of the sample (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html):
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls
To get different results, simply remove the parameter.