I want to train a deep learning model (say SSD or yolo) for object detection. The object I want to detect has very high aspect ratio, say a pencil. I want the output bounding box as close as possible to the object with similar aspect ratio. How should I optimize the model for this? Should I optimize all aspect ratio of pre-defined boxes to make them closer to the real object? For my case, the object is always in one orientation. Thanks
Yes, it would be better to use anchors/default boxes with aspect-ratio similar to what you see in your data.
For example, if you use TF Object Detection API, each model contains a config file with the different model configurations.
i.e.: https://github.com/tensorflow/models/blob/master/research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
Usually the term aspect ration is referring to the result of width/height
So, in case you would have wanted only landscape-like objects you would keep only the aspect ratios bigger than 1 (2.0,3.0)
Also, just for emphasizing this point, giving aspect-ratios similar to what you expect can be seen in the literature.
For example - YOLOV3 article (https://arxiv.org/pdf/1804.02767.pdf)
In yolov3 - Redmond chose the anchors after analyzing the most probable object shapes in coco.