Search code examples
machine-learningregressionrandom-forest

Test accuracy is greater than train accuracy what to do?


I am using the random forest.My test accuracy is 70% on the other hand train accuracy is 34% ? what to do ? How can I solve this problem.


Solution

  • Test accuracy should not be higher than train since the model is optimized for the latter. Ways in which this behavior might happen:

    • you did not use the same source dataset for test. You should do a proper train/test split in which both of them have the same underlying distribution. Most likely you provided a completely different (and more agreeable) dataset for test

    • an unreasonably high degree of regularization was applied. Even so there would need to be some element of "test data distribution is not the same as that of train" for the observed behavior to occur.