I've been reading quite a bit and i'm a little confused with k-folds. I understand the concept behind it, but i'm not sure about how to deploy it.
The usual step that i've been seeing after data exploration is train_test_split
, encoding and scaling fit_transform
the training sets and just fitting the testing sets before testing which algorithms work. After which they tune the hyper-parameters.
So if I were to use k-folds now, do I avoid using train_test_split? And at which do we use k-folds?
Thanks!
No. K-fold splits your data into train-test split K
times so you train K
different models.
This approach makes your model results more robust because you train K
different models with different parts of your dataset and also you predict different parts of your data K
times. Finally, you can simply take the average score of K
model.