Search code examples

Tuning of hyperparameters and evaluation using cross validation

I have some trouble grasping the standard way of how to use cross validation for hyperparameter tuning and evaluation. I try to do 10-fold CV. Which of the following is the correct way?

  1. All of the data gets used for parameter tuning (e. g. using random grid search with cross validation). This returns the best hyperparameters. Then, a new model is constructed with these hyperparameters, and it can be evaluated by doing a cross validation (nine folds for training, one for testing, in the end the metrics like accuracy or confusion matrix get averaged).

  2. Another way that I found is to first split the data into a train and a test set, and then only perform cross validation on the training set. One then would evaluate using the testset. However, as I understand, that would undermine the whole concept of cross validation, as the idea behind it is to be independent of the split, right?

  3. Lastly, my supervisor told me that I would use eight folds for training, one for hyperparameter estimation and one for testing (and therefore evaluation). However, I could not find any material where this approach had been used. Is that a standard procedure or have I just understood something wrong there?


  • in general you can split your data into 3 sets.

    • training set
    • validation set
    • test set

    Test set:
    The test set is the most easiest one to explain.
    Once you've created your test set (15-30% of the data). You store this data set somewhere and you DON'T TOUCH that data set ANYMORE until you think you're done.
    - The reason for this is simple, once you start to focus on this data set (e.g. to increase the AUC or ...) then you're starting to over fit your data ...

    The same also counts for the validation set (+/-). When you're hyper-tuning your parameters etc. you're starting to focusing on this set ... which means that you aren't generalizing anymore. (and a good model, should work on all data, not only on the test and validation set).

    That been said, now you've only the training- and validation set over.

    Cross validation: some motivations to use cross validation is to have a better generalization and view of your model/data (imagine, that some special cases only existed in the validation set etc. + you don't take a single decision for granted. - the main downside of e.g. 10-fold cross validation is ... it takes 10 times longer to finish ... but it gives you a more trustworthy results ... (e.g. if you do 10 fold cross validation and your AUC fluctuates from 80 85 75 77 81 65 ... --> then you might have some data issues ... in a perfect scenario, the diff between the AUC should be small ...

    Nevertheless ... what I would do (and it also depends on your resources, model, time, data set size)

    • Create 10 random folds. (and keep track of them)

    • Do a 10 fold- grid search if possible (to have a general view the importance of each parameter, (you don't have to take small steps ... E.g. Random forest has a max_features parameter, but if you notice that all the models perform less when that value is log2, then you can just eliminate that hyper parameter)

    • check which hyper-parameters performed well
    • do a 10 fold random search or grid search in the area's which performed well

    but always use the same folds for each new experiment, in this way you can compare the models with each other. + Often you'll see that some folds are more difficult then other folds but they are difficult for all the the models

    enter image description here