python tensorflow object-detection object-detection-api mobilenet

ssd mobilenet v1: change feature map layout

I'm trying to re-train an SSD model to detect one class of custom objects (guitars). I'm using the ssd_mobilenet_v1_coco model, with a dataset of 1000K pre-labeled images downloaded from the OpenImage dataset.
I am referring to this answer to try to improve detection of small objects in an image.

As suggested there, I wanted to add an extra feature map (Conv2d_5_pointwise) to the ones already present, thus having a total of 7 feature maps. So, I modified the "models/ssd_mobilenet_v1_feature_extractor.py" this way:

 feature_map_layout = {
        'from_layer': ['Conv2d_5_pointwise','Conv2d_11_pointwise', 'Conv2d_13_pointwise', '', '',
                       '', ''][:self._num_layers],
        'layer_depth': [-1, -1, -1, 512, 256, 256, 128][:self._num_layers],
        'use_explicit_padding': self._use_explicit_padding,
        'use_depthwise': self._use_depthwise,
    }

And, accordingly, I changed num_layers into the config file to 7 too.


    anchor_generator {
      ssd_anchor_generator {
        num_layers: 7
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }

However, when trying to train the model with main_model.py, I get the error message


 File "/home/carlo/projects/tf_models/research/object_detection/core/anchor_generator.py", line 105, in generate
    raise ValueError('Number of feature maps is expected to equal the length '
ValueError: Number of feature maps is expected to equal the length of `num_anchors_per_location`.

Should I modify anything else to make it work? Thanks!

Solution

Ok, figured it out.

Simply, I had to modify another parameter (num_layers) in the constructor of the SSDMobileNetV1FeatureExtractor class:

def __init__(self,
           is_training,
           depth_multiplier,
           min_depth,
           pad_to_multiple,
           conv_hyperparams_fn,
           reuse_weights=None,
           use_explicit_padding=False,
           use_depthwise=False,

           num_layers=7,    <--- HERE

           override_base_feature_extractor_hyperparams=False):

to match the new number of feature maps.