I am building a CNN pn Pytorch using the pre-trained DenseNet121 model. I am replacing the classifier of the pre-trained model with my classifier. I tried to do it in two ways. While the first one worked, the second one gave the abovementioned error while training. I need to use the second one to add attention to the model. Why is the second one giving an error when both are the same?
First code which is working correctly
model = models.densenet121(pretrained=True)
for param in model.parameters():
param.requires_grad = False
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(1024, 512)),
('relu', nn.ReLU()),
('fc2', nn.Linear(512, 10)),
('output', nn.LogSoftmax(dim=1))
]))
model.classifier = classifier
Second code which is giving error while training
net = models.densenet121(pretrained=True)
for param in net.parameters():
param.requires_grad = False
class AttnDenseNet121(nn.Module):
def __init__(self, num_classes, normalize_attn=False, dropout=None):
super(AttnDenseNet121, self).__init__()
self.features = net.features
self.classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(1024, 512)),
('relu', nn.ReLU()),
('fc2', nn.Linear(512, 10)),
('output', nn.LogSoftmax(dim=1))
]))
def forward(self, x):
x = self.features(x)
out = self.classifier(x)
return out
model = AttnDenseNet121(num_classes=10, normalize_attn=True)
Training code is same for both and batch size = 32
After exploring the source code of torchvision I found the implementation of the forward
pass of the densenet here. You can see that some additional layers are applied between the features
module the the classifier:
def forward(self, x: Tensor) -> Tensor:
features = self.features(x)
out = F.relu(features, inplace=True)
out = F.adaptive_avg_pool2d(out, (1, 1))
out = torch.flatten(out, 1)
out = self.classifier(out)
return out
You should adopt the same pipeline to be consistent. Please note the adaptive_avg_pool2d
function that modifies the shape by applying an avg_pooling so that each feature map are of shape (1024, 1, 1). Then a flatten layer is applied to make the maps of shape (1024,) and so compatible with linear layers in the classifier. These 2 functions will certainly solve your problem no matter what the input image size is.