Now I am trying to train object detection - YOLOv1 using this code. At the beginning I was using momentum
and weight_decay
but the training loss after couples of epochs becomes NaN
. As far as I know it's because of gradient exploding, so I have searched some ways to get rid of this NaN
then I ignored momentum
and weight decay
. As a result I did not get any NaN
, however my model could not converge as I expected. When I calculated mAP
it was only 0.29. I am using VOC 2007 and 2012 data for training and as a test set VOC 2007 test.
So my questions are followings:
NaN
while training?Would appreciate any suggestions here.
After checking your code, I saw after the first epoch, you would set the learning rate to 0.01
until epoch 75. In my opinion, that large learning rate is the main reason made your parameters became vanishing/exploding. Normally, the learning rate is scaling around 0.001
with the factor of 2,1,0.1
.
Follow the config in this repo (the most famous repo implement YOLOv1 according to paperwithcode), you can see their configurations setup. You can follow their hyper-parameters, momentum=0.9
and decay=0.0005
in your question.
Note: please be careful that the batch norm momentum in Tensorflow = 1 - momentum in Pytorch.
Finally, your number of parameters before and after training should be the same, so if your model is heavier/lighter after the training process, it means there is something wrong with your training code.