I'm new with Pytorch and I need a clarification on multiclass classification.
I'm fine-tuning the DenseNet neural network, so it can recognize 3 different classes.
Because it's a multiclass problem, I have to replace the classification layer in this way:
kernelCount = self.densenet121.classifier.in_features
self.densenet121.classifier = nn.Sequential(nn.Linear(kernelCount, 3), nn.Softmax(dim=1))
And use CrossEntropyLoss as the loss function:
loss = torch.nn.CrossEntropyLoss(reduction='mean')
By reading on Pytorch forum, I found that CrossEntropyLoss applys the softmax function on the output of the neural network. Is this true? Should I remove the Softmax activation function from the structure of the network?
And what about the test phase? If it's included, I have to call the softmax function on the output of the model?
Thanks in advance for your help.
Yes, CrossEntropyLoss
applies softmax implicitly. You should remove the softmax layer at the end of the network since softmax is not idempotent, therefore applying it twice would be a semantic error.
As far as evaluation/testing goes. Remember that softmax is a monotonically increasing operation (meaning the relative order of outputs doesn't change when you apply it). Therefore the result of argmax before and after softmax will give the same result.
The only time you may want to perform softmax explicitly during evaluation would be if you need the actual confidence value for some reason. If needed you can apply softmax explicitly using torch.softmax
on the network output during evaluation.