Search code examples
machine-learningscikit-learntrain-test-splitk-fold

K-folds do we still need to implement train_test_split?


I've been reading quite a bit and i'm a little confused with k-folds. I understand the concept behind it, but i'm not sure about how to deploy it.

The usual step that i've been seeing after data exploration is train_test_split, encoding and scaling fit_transform the training sets and just fitting the testing sets before testing which algorithms work. After which they tune the hyper-parameters.

So if I were to use k-folds now, do I avoid using train_test_split? And at which do we use k-folds?

Thanks!


Solution

  • No. K-fold splits your data into train-test split K times so you train K different models.

    This approach makes your model results more robust because you train K different models with different parts of your dataset and also you predict different parts of your data K times. Finally, you can simply take the average score of K model.