I've heard a lot of people talk about some of the causes but they never really answer if it should be fixed or not. I checked my dataset for leaks and I took 20% for my validation set at random from a TFRecords dataset. I'm starting to suspect that my model has too many regularization layers. Should I lessen my regularization to get the validation line on top of the training line? or does it really even matter?
Nothing wrong with validation loss being lower than training loss. It simply depends on the probability distribution of the validation set. If you have a lot of dropout in your model this can easily be the case because training loss is calculated with dropout present. In calculating the validation loss dropout is disabled. Issue is is your training accuracy at an acceptable level. If it is not then reduce regularization in the model.