Search code examples
machine-learningdeep-learningcross-validationtraining-datakaggle

Validation and Testing accuracy widely different


I am currently working on a dataset in kaggle. After training the model of the training data, I testing it on the validation data and got an accuracy of around 0.49.

However, the same model gives an accuracy of 0.05 on the testing data.

I am using neural networks as my model

So, what are the possible reasons for this to happen and how does one begin to check and correct these issues?


Solution

  • Reasons for a high generalization gap:

    1. Different distributions: The validation and test set might come from different distributions. Try to verify that they are indeed sampled from the same process in your code.
    2. Number of samples: The size of the validation and / or the test set is too low. This means that the empirical data distributions differ too much, explaining the different reported accuracies. One example would be a dataset consisting of thousands of images, but also thousands of classes. Then, the test set might contain some classes that are not in the validation set (and vice versa). Use cross-validation to check, if the test accuracy is always lower than the validation accuracy, or if they just generally differ a lot in each fold.
    3. Hyperparameter Overfitting: This is also related to the size of the two sets. Did you do hyperparameter tuning? If so, you can check if the accuracy gap existed before you tuned the hyperparameters, as you might have "overfitted" the hyperparameters on the validation set.
    4. Loss function vs. accuracy: you reported different accuracies. Did you also check the train, validation and test losses? You train your model on the loss function, so this is the most direct performance measure. If the accuracy is only loosely coupled to your loss function and the test loss is approximately as low as the validation loss, it might explain the accuracy gap.
    5. Bug in the code: if the test and validation set are sampled from the same process and are sufficiently large, they are interchangeable. This means that the test and validation losses must be approximately equal. So, if you checked the four points above, my next best guess would be a bug in the code. For example, you accidentally trained your model on the validation set as well. You might want to train your model on a larger dataset and then check, if the accuracies still diverge.