Consider the following code
from torch import nn
from torchsummary import summary
from torchvision import models
model = models.efficientnet_b7(pretrained=True)
model.classifier[-1].out_features = 4 # because i have a 4-class problem; initially the output is 1000 classes
model.classifier = nn.Sequential(*model.classifier, nn.Softmax(dim=1)) # add softmax
# freeze features
for child in model.features:
for param in child.parameters():
param.requires_grad = False
When I run
model.classifier
I get the below (expected) output
which as per my calculations implies that the total trainable parameters should be (2560 + 1) * 4 output nodes = 10244 tranable params.
However, when i attempt to calculate the total number of trainable params by
summary(model, (3,128,128))
I get
and by
sum(p.numel() for p in model.parameters() if p.requires_grad)
The 2,561,000, in both cases, comes from (2560 + 1) * 1000 classes.
But, why does it still consider 1000 classes though ?
Resetting an attribute of an initialized layer does not necessarily re-initialize it with the newly-set attribute. What you need is model.classifier[-1] = nn.Linear(2560, 4)
.