python deep-learning object-detection mxnet

Is there a way to modify the layers of a GluonCV pretrained object detection model

Good day, StackOverflow.

I am looking to replace the convolutional layers of a GluonCV pretrained model for object detection, with deformable convolutional layers. Specifically, I am looking to replace the convolutional layers inside the CNN that is used for feature extraction of an object detection model. I am targeting the Faster RCNN and SSD Detection models for replacement.

I have tried the following snippet of code:

def replace_conv2D(net):
    for key, layer in net._children.items():
        if isinstance(layer, gluon.nn.Conv2D):
            new_conv = gluon.nn.Conv2D(
                channels=layer._channels // 2,
                kernel_size=layer._kwargs['kernel'],
                strides=layer._kwargs['stride'],
                padding=layer._kwargs['pad'],
                in_channels=layer._in_channels // 2)
            with net.name_scope():
                net.register_child(new_conv, key)
            new_conv.initialize(mx.init.Xavier())
        else:
            replace_conv2D(layer)
net = gluon.model_zoo.vision.get_model("resnet18_v1", pretrained=True)
replace_conv2D(net)

and tried to verify that the model's convolutional layers were replaced using:

def replace_conv2D(net):
    for key, layer in net._children.items():
        print(f"{key}:{layer}")

But i cannot verify if my object detection models have their convolutional layers replaced. I can only verify it works on image classification models

It works for a basic resnet50 model(image classification)

Before(ResNet:50)

ResNetV1(
  (features): HybridSequential(
    (0): Conv2D(3 -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
)

After(ResNet50)

ResNetV1(
  (features): HybridSequential(
    (0): DeformableConvolution(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
    (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
)

But for a SSD_resnet50 model(object detection):

I encounter the following output as the first layer:

features FeatureExpander(
<Symbol group [ssd2_resnetv10_stage3_activation5, ssd2_resnetv10_stage4_activation2, ssd2_expand_reu0, ssd2_expand_reu1, ssd2_expand_reu2, ssd2_expand_reu3]> : 1 -> 6
)

After the method has run, i do not observe any changes:

features FeatureExpander(
<Symbol group [ssd2_resnetv10_stage3_activation5, ssd2_resnetv10_stage4_activation2, ssd2_expand_reu0, ssd2_expand_reu1, ssd2_expand_reu2, ssd2_expand_reu3]> : 1 -> 6
)

Solution

I solved the issue by performing the following steps:

Copying the source code of the SSD model from GluonCV's Github Repo into its own class file
Modifying the class file to incorporate needed changes (Replacing certain convolutions with deformable convolutions in my case) to create my modified version
Loading the original SSD model, then saving the original model's parameters as 'transfer.params'
Creating my modified SSD model, then loading 'transfer.params'(which is the original model's weights) with allow_missing=True & ignore_extra = True

The resultant modified model will have pretrained weights where their layer's name match up, except for the layers which are modified for one's needs.

These steps can be extended to modify the backbone network as needed.