When I am building a classifier in PyTorch, I have 2 options to do
nn.CrossEntropyLoss
without any modification in the modelnn.NNLLoss
with F.log_softmax
added as the last layer in the modelSo there are two approaches.
Now, what approach should anyone use, and why?
They're the same.
If you check the implementation, you will find that it calls nll_loss
after applying log_softmax
on the incoming arguments.
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
Edit: seems like the links are now broken, here's the C++ implementation which shows the same information.