I have a dataset for classification which is composed of a training of size 8000x(32x32x3 images) and of a test of size 2000x(same size images).
I am doing a very simple task of distinguishing vehicules and background. I am using the cross_entropy as a cost function.
The net I am using is almost the same as the one used in DeepMNIST except the first filter has size 3x... instead of 1x... because it is a colour image and the output has size two because there are only two classes : vehicules or not vehicules. Seeing the results of this relatively straight forward task has led me to ask myself several interrogations :
-First if I do not use a large enough batch size (>200) I get stuck almost every time at accuracy 62% (in a local optima) over the two sets which is not sufficient for my need
-Secondly whenever I use the right optimizer Adam with the right batch size and learning rate I go up to 92% however the outputs are always very disturbingly good like [0.999999999 0.000000000001].
This should not happen as the task is difficult.
Therefore when I go fully convolutional to create a heatmap I got 1.000001 almost everywhere due to the saturation.
What am I doing wrong ? Do you think whitening would solve the problem ? Batch normalization ? Something else ? What am I facing ?
That's a sign of overfitting. If you train on small dataset long enough with a large enough model, eventually your confidences get saturated to 0's and 1's. Hence, same techniques that prevent overfitting (regularization penalties, dropout, early stopping, data augmentation) will help there.
My first step for a tiny dataset like this, would be to augment the dataset with noise corrupted examples. IE, for your example I would add 800k noise corrupted examples with original labels, and train on those.