deep-learning pytorch loss-function multiclass-classification activation-function

Why does multi-class classification fails with sigmoid?

MNIST trained with Sigmoid fails while Softmax works fine

I am trying to investigate how different activation affects the final results, so I implemented a simple net for MNIST with PyTorch.

I am using NLLLoss (Negative log likelihood) as it implements Cross Entropy Loss when used with softmax.

When I have softmax as activation of the last layer, it works great. But when I used sigmoid instead, I noticed that things fall apart

Here is my network code

def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 80)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.XXXX(x)

where XXXX is the activation function

both Sigmoid and Softmax output values between (0, 1). Yes Softmax guarantees the sum of 1 but I am not sure if this answers why the training fails with Sigmoid. Is there any detail I am not catching here?

Solution

Sigmoid + crossentropy can be used for multilabel classification (assume a picture with a dog and a cat, you want the model to return "dog and cat"). It works when the classes aren't mutually exclusive or the samples contain more than one object that you want to recognize.

In your case MNIST has mutually exclusive classes and in each image there is only one number, so it is better to use logsoftmax + negative loglikelihood, which assume that the classes are mutually exclusive and there is only one correct label associated to the image.

So, you can't really expect to have that behavior from sigmoid.