image-processing machine-learning computer-vision pytorch object-detection

Use only certain layers of pretrained torchvision network

I'm trying to use only certain layers in a pretrained torchvision Faster-RCNN network initialized by:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

This works. However, passing model.modules() or model.children() into an nn.Sequential yields an error. Even passing the whole model leads to errors, e.g.

model = torch.nn.Sequential(*model.modules())
model.eval()
# x is a [C, H, W] image
y = model(x)

leads to

AttributeError: 'dict' object has no attribute 'dim'

and

model = torch.nn.Sequential(*model.children())
model.eval()
# x is a [C, H, W] image
y = model(x)

leads to

TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple

This confuses me because I have modified other PyTorch pretrained models like that in the past. How can I use the FasterRCNN pretrained model to create a new (pretrained) model that uses only certain layers, e.g. all layers but the last one?

Solution

Unlike other simple CNN models, it is not trivial to convert an R-CNN based detector to a simple nn.Sequential model. If you look at the functionality of R-CNN ('generalized_rcnn.py') you'll see that the output features (computed by the FCN backbone) are not just passed to the RPN component, but rather combined with the input image and even with the targets (during training).

Therefore, I suppose if you want to change the way faster R-CNN behaves, you'll have to use the base class torchvision.models.detection.FasterRCNN() and provide it with different roi pooling parameters.