Search code examples
pythonvalidationmachine-learninghyperparameters

Machine learning query


Please correct me if I am wrong. "Training Set is used for calculating parameters of a machine learning model, Validation data is used for calculating hyperparameters of the same model (we use same weights with different hyperparameters), and Test set is used for evaluating our model". If true, can someone explain the whole process in a little more detail. TIA.


Solution

  • Suppose you train your random forest classifier on 70% of the data you have, then it will help your classifier to identify useful attributes or features for the random forest classifier from this training data. But there are many hyper parameters e.g. depth of a Random Forest classifier, which affects the performance of RF classifier. You can check which depth gives you best performance by plotting an accuracy graph on Validation set (say another 10% of your data). After training and finding right values of hyper-parameters, you can test your classifiers performance on test data (remaining 20% of data).