artificial-intelligence object-detection

Object Detection using SSD which way will give higher accuracy and recall rate

I have a question more related to SSD fundamental than any specific implementation:

if my goal is to detect all "foreground" objects and bbox them and not too much care whether is a "dog" or "cat", etc. Then which way will provide me better overall accuracy and recall rate: Assuming there are 10 different categories of objects in the foreground:

1. to train SSD with 10 different classes; or
2. to train SSD with 1 class as foreground and label all 10 different categories objects as "foreground"

Thank you very much for your help.

Solution

after several months research and experiments, for SSD, I would say the option #1 "to train SSD with 10 different classes" will reach higher accuracy and recall if "don't care" miss classification and just count them all in as long as it is a foreground object. The reason is: SSD relies on the shape and size of bbox for classification in addition to the feature map. So the more classes we train, the more "detectors" it has, thus implies that more discerning power of SSD is equipped for this specific question in concern.