python pytorch conv-neural-network densenet

RuntimeError: mat1 and mat2 shapes cannot be multiplied (229376x7 and 1024x512)

I am building a CNN pn Pytorch using the pre-trained DenseNet121 model. I am replacing the classifier of the pre-trained model with my classifier. I tried to do it in two ways. While the first one worked, the second one gave the abovementioned error while training. I need to use the second one to add attention to the model. Why is the second one giving an error when both are the same?

First code which is working correctly

model = models.densenet121(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 512)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(512, 10)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
model.classifier = classifier

Second code which is giving error while training

net = models.densenet121(pretrained=True)
for param in net.parameters():
    param.requires_grad = False

class AttnDenseNet121(nn.Module):
    def __init__(self, num_classes, normalize_attn=False, dropout=None):
        super(AttnDenseNet121, self).__init__()
        self.features = net.features
        self.classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 512)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(512, 10)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    def forward(self, x):
        x = self.features(x)
        out = self.classifier(x)
        return out
model = AttnDenseNet121(num_classes=10, normalize_attn=True)

Training code is same for both and batch size = 32

Solution

After exploring the source code of torchvision I found the implementation of the forward pass of the densenet here. You can see that some additional layers are applied between the features module the the classifier:

def forward(self, x: Tensor) -> Tensor:
    features = self.features(x)
    out = F.relu(features, inplace=True)
    out = F.adaptive_avg_pool2d(out, (1, 1))
    out = torch.flatten(out, 1)
    out = self.classifier(out)
    return out

You should adopt the same pipeline to be consistent. Please note the adaptive_avg_pool2d function that modifies the shape by applying an avg_pooling so that each feature map are of shape (1024, 1, 1). Then a flatten layer is applied to make the maps of shape (1024,) and so compatible with linear layers in the classifier. These 2 functions will certainly solve your problem no matter what the input image size is.