Search code examples
deep-learningcaffe

How to tackle overfitting with fully convolutional network in caffe


I have a fully convolutional network (specifically the stacked hourglass network) in caffe. After each convolutional layer I have a batch normalisation and a scale layer and a ReLU layer. However, I encounter the problem of overfitting. Usually, I would increase my dataset (which is not possible) or I would use Dropout layers, but since I have read that it is not useful to use Dropout layers in fully convolutional networks I have no idea how to tackle the problem. Are there any things to do apart from what I have mentioned. Might Regulariztion be helpful in this case?


Solution

  • Here is a handy picture I stole from the interwebs. It is a handy chart of things to try when your deep learning model is having trouble. You say that you have heard that Dropout isn't good in Conv, but have you tested it? Start with that and proceed thusly :

    1. Add dropout to Conv layer with a large depth dimension that is close to the output
    2. Try not going deep. This is the reverse of go deep, which you should try before going deep. Make sure you have a simple model that doesn't overfit first, then try adding layers.
    3. If you are still overfitting and you have dropout then try removing neurons by making your later Conv layers not have as much depth.
    4. Do what Z.Kal says, multiply your dataset by transforming it.
    5. And if all that doesn't make a difference accept the fact that your architecture is probably wrong. You have buried deep in it a way for it to store all the data you feed it verbatim without generalizing it. Consider making a squeeze point, where you have a layer that is small compared to the input data.

    enter image description here

    update 2020/10/22 - After several years of coding convolution and experimenting with reasons why my embedding layers seems to have unreasonably high covariance issues I have tracked it down to dropout. Dropout encourages covariance (which is not good btw). Instead of dropout, I use other regularizers or just skip regularization altogether and focus on initialization and architecture. Here is a (bad) video I made showing how to effectively train a super deep 400 layers of convolution and the tricks used to help get it trained and operational