Search code examples
tensorflowobject-detectionevaluation

Are these the expected results of tensorflow object-detection evaluation using model_main.py?


Running TensorFlow object detection API training and evaluation on customized dataset with 8 classes, I have two questions regarding the outcomes of running this task using model_main.py

  1. The total loss started going up (relatively) after 10k steps ..it went below 1 after 8000 steps but started going up slowly from 10k steps to 80k step and ended with 1.4 loss.. any reason why would this happen?

  2. Regarding the evaluation results, why only the IoU=0.50 has 0.966 precision while the rest are below 0.5 as shown below:

Accumulating evaluation results...
DONE (t=0.07s).
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.471
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.966
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.438
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.471
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.447
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.562
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.587
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.587
INFO:tensorflow:Finished evaluation at 2019-05-06-03:56:37
INFO:tensorflow:Saving dict for global step 80000: DetectionBoxes_Precision/mAP

Solution

  • Yes, these results are reasonable. Answering your questions:

    1. The total loss on tensorboard is actually Evaluation Loss and if this loss starts going up, then probably your model is overfitting. See an earlier answer with similar case here.
    2. The evaluation results are of COCO evaluation format. The precision and recall are calculated across different IoUs, different areas and different maximum number of detections (maxDets). For example, [ IoU=0.50:0.95 | area= all | maxDets=100 ] means the precision is calculated with IoU ranging from 0.5 to 0.95 (with 0.05 as step size, all detections with IoU in this range are considered positive detections), and area ranging from small, medium and large, and maximum number of detections of 100. As you can imagine, Lower IoU threshold means more detections will be counted as true positives, so IoU=0.5 has the highest precision score because it has the largest number of positive detections, and when IoU=0.95, less detections are counted as true positives. IoU=0.50:0.95 is the average of all precisions across different IoUs, so precision for this category is lower than when IoU=0.5.

    BTW, the -1.00 when area=small, medium means such categories are absent, see here. So it means all objects in your dataset are very large.

    Here is a good illustration of why lower IoU means more detections are true positives. (image source)

    enter image description here

    If we would include IoU=0.4, then all three detections are correct detections (true positives), if we set IoU=0.6, then only two are correct, and when IoU=0.9, only one detection is correct.

    Some further reading regarding how mAP is calculated.