Search code examples
deep-learningobject-detectionresnetconv-neural-network

ResNet-101 FeatureMap shape


I am really new to CNN and having many troubles with studying it.

I'm trying to extract the CNN feature map using ResNet-101 and I desire to get a shape of 2048, 14*14. To get a feature map I removed the last layer of ResNet-101 model and adjusted Adaptive Average Pool. So I got torch.Size([1, 2048, 1, 1]) shape of output.

But I want to get torch.Size([1, 2048, 14, 14]) not the torch.Size([1, 2048, 1, 1]).

Anyone can help me to get the result? Thx.

#load resnet101 model and remove the last layer
model = torch.hub.load('pytorch/vision:v0.5.0', 'resnet101', pretrained=True)
model = torch.nn.Sequential(*(list(model.children())[:-1]))


#extract feature map from an image and print the size of the feature map
from PIL import Image
import matplotlib.pylab as plt
from torchvision import transforms

filename = 'KM_0000000009.jpg'
input_image = Image.open(filename)

preprocess = transforms.Compose([
    transforms.Resize((244,244)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

input_tensor = preprocess(input_image)

input_tensor = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

with torch.no_grad():
    output = model(input_tensor)

print(output.size()) #torch.Size([1, 2048, 1, 1])

Solution

  • You were one step from what you wanted.

    First things first - you should always check module's source code (which is located here for ResNet). It may have some functional operations (e.g. from torch.nn.functional module) so it may not be transferable directly to torch.nn.Seqential, luckily it is in ResNet101 case.

    Secondly, feature maps are dependent on size of the input, for standard ImageNet-like image size ([3, 224, 224], notice your image size is different) there is no layer with shape [2048, 14, 14], but [2048, 7, 7] or [1024, 14, 14]).

    Thirdly, there is no need to use torch.hub for ResNet101 as it uses torchvision models under the hood anyway.

    With all that in mind:

    import torch
    import torchvision
    
    # load resnet101 model and remove the last layer
    model = torchvision.models.resnet101()
    model = torch.nn.Sequential(*(list(model.children())[:-3]))
    
    # image-like
    image = torch.randn(1, 3, 224, 224)
    
    with torch.no_grad():
        output = model(image)
    
    print(output.size())  # torch.Size([1, 1024, 14, 14])
    

    If you would like [2048, 7, 7] use [:-2] instead of [:-3]. Also, you can notice below how the feature map size changes with image shape:

    model = torch.nn.Sequential(*(list(model.children())[:-2]))  
    # Image twice as big -> twice as big height and width of features!
    image = torch.randn(1, 3, 448, 448)
    
    with torch.no_grad():
        output = model(image)
    
    print(output.size())  # torch.Size([1, 2048, 14, 14])