Search code examples
pythonmachine-learningkerastrain-test-split

Why does my `train_test_split()` returns same samples


Why does my sklearn.model_selection.train_test_split() returns same samples of X_train, X_test, y_train, y_test each time I run the code, even though I have kept shuffle=True, and I have not manually defined the seed value?

I am printing the samples like this:

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 100, shuffle=True)

print (y_test)

Solution

  • The train_test_split random_state controls the state of the sample (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html):

    Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls

    To get different results, simply remove the parameter.