I've implemented a home-brewed ZFNet (prototxt) for my research. After 20k iterations with the definition, the test accuracy stays at ~0.001 (i.e., 1/1000), the test loss at ~6.9, and training loss at ~6.9, which seems that the net keeps playing guessing games among the 1k classes. I've thoroughly checked the whole definition and tried to change some of the hyper-parameters to start a new training, but of no avail, same results' shown on the screen....
Could anyone show me some light? Thanks in advance!
The hyper-parameters in the prototxt are derived from the paper [1]. All the inputs and outputs of the layers seems correct as Fig. 3 in the paper suggests.
The tweaks are:
crop
-s of the input for both training and testing are set to 225
instead of 224
as discussed in #33;
one-pixel zero paddings for conv3
, conv4
, and conv5
to make the sizes of the blobs consistent [1];
filler types for all learnable layers changed from constant
in [1] to gaussian
with std: 0.01
;
weight_decay
: changing from 0.0005
to 0.00025
as suggested by @sergeyk in PR #33;
[1] Zeiler, M. and Fergus, R. Visualizing and Understanding Convolutional Networks, ECCV 2014.
and for the poor part..., I pasted it here
A few suggestions:
gauss
to xavier
."PReLU"
acitvations, instead of "ReLU"
. once your net converges you can finetune to remove them. base_lr
by an order of magnitude (or even two orders).