tensorflow validation machine-learning deep-learning conv-neural-network

Why is the validation loss and accuracy oscillating that strong?

I'm currently training a CNN to detect if a person wears a mask. Unfortunately, I do not understand why my validation loss is so high. As I noticed, the data I am validating on is in is sorted after classes (which are the output of the net). Does that have any impact on my validation accuracy and loss? I tested the model with the use of Computer Vision and it works excellent but the validation loss and accuracy still looks very wrong. What are the reasons to that?

Solution

This phenomenon, at an intuitive level, can take place due to several factors:

It may be the case that you are using very big batch sizes (>=128) which can cause those fluctuations since the convergence can be negatively impacted if the batch size is too high. There are several papers that have studied this phenomenon. This may or may not be the case for you.
It is probable that your validation set is too small. I experienced such fluctuations when the validation set was too small (in number, not necessarily percentage split between training-validation). In such circumstances, a change in weights after an epoch has a more visible impact on the validation loss (and automatically on the validation accuracy to a great extent).

In my opinion and according to my experience, if you consider/checked that your model works well in the real life, you can decide to train only for 50 epochs, since you can see from the graph that it is a optimal cut-off point, as the fluctuations intensify after that point and also a small overfitting phenomenon may be observed.