Search code examples

is crossentropy loss of pytorch different than "categorical_crossentropy" of keras?

I am trying to mimic a pytorch neural network in keras.

I am confident that my keras version of the neural network is very close to the one in pytorch but during training, I see that the loss value of the pytorch network are much lower than the loss values of the keras network. I wonder if this is because I have not properly copied the pytorch network in keras or the loss computation is different in the two framework.

Pytorch loss definition:

loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(),, momentum=0.9, weight_decay=5e-4)

Keras loss definition:

sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
resnet.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy'])

Note that all the layers in the keras network have been implemented with L2 regularization kernel_regularizer=regularizers.l2(5e-4), also I used he_uniform initialization which I believe is default in pytorch, according to the source code.

The batch size for the two networks are the same: 128.

In the pytorch version, I get loss values around 4.1209 which decreases to around 0.5. In keras it starts around 30 and decreases to 2.5.


  • Keras categorical_crossentropy by default uses from_logits=False which means it assumes y_pred contains probabilities (not raw scores) (source). You can choose to use a softmax/sigmoid layer, just make sure to set the from_logits argument accordingly.

    PyTorch CrossEntropyLoss accepts unnormalized scores for each class i.e., not probability (source). Thus, if using CrossEntropyLoss you should not use a softmax/sigmoid layer at the end of your model.

    If this confuses you, please read this discuss.pytorch post.