deep-learning computer-vision object-detection yolo darknet

YOLOv3 transfer learning performance is worse than its official weight

Recently, I trained yolov3 with transfer learning method.

I used the following command to train my yolov3 weight.

./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74  -gpus 0,1,2,3 -map | tee -a yolov3-official-transfer-learning.log

After submitting 500200 batches weight to CodaLab to test the performance on COCO Dataset,

I got the following result:

AP: 0.321
AP_50: 0.541
AP_75: 0.339
AP_small: 0.143
AP_medium: 0.332
AP_large: 0.450
AR_max_1: 0.284
AR_max_10: 0.434
AR_max_100: 0.454
AR_small: 0.257
AR_medium: 0.473
AR_large: 0.617

Comparing to the official weight on CodaLab

AP: 0.315
AP_50: 0.560
AP_75: 0.324
AP_small: 0.153
AP_medium: 0.334
AP_large: 0.430
AR_max_1: 0.278
AR_max_10: 0.433
AR_max_100: 0.456
AR_small: 0.267
AR_medium: 0.484
AR_large: 0.610

We can clearly see that AP_50 in official weight is 1.9% higher than my self-trained version.

By the way,

[1] I used AlexeyAB/darknet, not pjreddie/darknet/ to train YOLOv3.

[2] I Used COCO2014 as my training dataset.

Does anyone know how to explain this situation? Or is it possible to reproduce the official result?

Solution

Did you use the default cfg? If so, you probably trained at a lower resolution and/or with smaller mini-batches than the authors which would mean more stochastic training -> lower AP.

There's also a level of randomness with training DNNs, I've seen networks train to slightly different APs with identical configurations. It's likely that the authors of Yolov3 ran many training trials and chose the very best result for publication so it's likely that on average, an exact imitation of their training would produce slightly worse results.