python pytorch object-detection nan yolo

How to solve gradient-exploding in YOLO v1

Now I am trying to train object detection - YOLOv1 using this code. At the beginning I was using momentum and weight_decay but the training loss after couples of epochs becomes NaN. As far as I know it's because of gradient exploding, so I have searched some ways to get rid of this NaN then I ignored momentum and weight decay. As a result I did not get any NaN, however my model could not converge as I expected. When I calculated mAP it was only 0.29. I am using VOC 2007 and 2012 data for training and as a test set VOC 2007 test.

So my questions are followings:

How can I get rid of NaN while training?
Where can I get best training configurations?
Is gradient exploding is normal in Object Detection task?
Is it normal in YOLOv1 getting 1.1Gb weights after training?

Would appreciate any suggestions here.

Solution

After checking your code, I saw after the first epoch, you would set the learning rate to 0.01 until epoch 75. In my opinion, that large learning rate is the main reason made your parameters became vanishing/exploding. Normally, the learning rate is scaling around 0.001 with the factor of 2,1,0.1.

Follow the config in this repo (the most famous repo implement YOLOv1 according to paperwithcode), you can see their configurations setup. You can follow their hyper-parameters, momentum=0.9 and decay=0.0005 in your question.

Note: please be careful that the batch norm momentum in Tensorflow = 1 - momentum in Pytorch.

Finally, your number of parameters before and after training should be the same, so if your model is heavier/lighter after the training process, it means there is something wrong with your training code.