Search code examples
pythonobject-detectionyolofaster-rcnn

Do anchor box size gets refined during training object detection models like Faster R CNN,YOLO and SSD?


I was learning, working of object detection models Faster R CNN,YOLOv3 and SSD.I got confusion with anchor box size refining.


Solution

  • Of course anchor boxes size (and position) get refined during training. As you said in a comment anchor boxes are nothing more than a set of reference boxes present at fixed positions on the output grid. Each grid cell additionally predicts the object-ness score as well as the label and the exact coordinates of the bounding box.

    These last coordinates correspond to the box size refining you are talking about. The implementation of such regression differs upon networks (SSD, Yolo, Faster-RCNN, ...).

    I encourage you to read the literature, and especially the Yolo papers that are very clear. In "YOLO9000: Better, Faster, Stronger" (available for free online), bounding box refining is explained in great detail page 3.

    Of course all of this is learnt during training, take a look at the loss function of Yolo in "You Only Look Once: Unified, Real-Time Object Detection" paper page 4.