Object Detection API - Processing background images and objects labeled multiply times

I use OD-API to train models. I have two question please regarding the way of processing backgrounds images and images that have same object labeled twice (or more) of different label names, and that when using faster_rcnn_resnet101 and SSD_mobilenet_v2.

1- When an image has no ground truth boxes(background image) do we generate Anchor boxes for them in case of using fRCNN (or default boxes for the SSD) even though we don't have GT boxes? Or the whole image in such a case will be a negative example?

2- When an image has two (or more) GT boxes that have same coordinates, but different label names, does this make issues when matching with Anchor boxes (or default boxes for the SSD)? like only one of the GT boxes will be matched here?

I will be glad for any help, I tried reading papers, tutorials and books but couldn't find answers or maybe I am missing something. Regarding question 2, Prof. Andrew Ng said at 6:55 of this video about Anchor Boxes in YOLO, that such cases, when we have multiply objects in the same grid cell, these cases can't be handled well. So maybe the same applies to my cases, even though I don't know what happens as a result in my cases. Also I think these files target_assigner.py and argmax_matcher.py have some clues, but also I can't really confirm.

Thank you in advance

Solution

1) Anchor boxes are independent of the ground truth boxes and are generated based on the image shape (and the anchor configuration). The targets are what is generated, based on the GT boxes and generated anchors, to train the bounding box regression head. If there are no ground truth boxes, no targets are generated and the whole image is used as negative samples for the classification head, while not affecting the regression head (it only trains on positive samples).

2) I am not 100% sure on this one, but as far as I can tell, the bounding box regression won't have a problem (if the bounding boxes are identical, the IoU with anchors is identical and the target assigner will just pick one of the two), but classification might. IIRC there are ways to enable multi-label classification (although I have no experience in it), so that may help you out a bit. The best solution, though, would be not to have objects annotated multiple times.