python tensorflow pytorch loss cross-entropy

Why is the Tensorflow and Pytorch CrossEntropy loss returns different values for same example

I have tried getting Tensorflow and Pytorch CrossEntropyLoss but it returns different values and I don't know why. Please find the below code and results. Thanks for your inputs and help.

import tensorflow as tf
import numpy as np

y_true = [3, 3, 1]
y_pred = [
    [0.3377, 0.4867, 0.8842, 0.0854, 0.2147],
    [0.4853, 0.0468, 0.6769, 0.5482, 0.1570],
    [0.0976, 0.9899, 0.6903, 0.0828, 0.0647]
]

scce3 = tf.keras.losses.SparseCategoricalCrossentropy(reduction=tf.keras.losses.Reduction.AUTO)
loss3 = scce3(y_true, y_pred).numpy()
print(loss3)

The result for above is : 1.69

Pytorch loss:

from torch import nn
import torch
loss = nn.CrossEntropyLoss()
y_true = torch.Tensor([3, 3, 1]).long()
y_pred = torch.Tensor([
    [0.3377, 0.4867, 0.8842, 0.0854, 0.2147],
    [0.4853, 0.0468, 0.6769, 0.5482, 0.1570],
    [0.0976, 0.9899, 0.6903, 0.0828, 0.0647]
])
loss2 = loss(y_pred, y_true)
print(loss2)

The loss value for above is: 1.5

Solution

Tensorflow's CrossEntropy expects probabilities as inputs (i.e. values after a tf.nn.softmax operation), whereas PyTorch's CrossEntropyLoss expects raw inputs, or more commonly named, logits. If you use the softmax operation, the values should be the same:

import tensorflow as tf
import numpy as np

y_true = [3, 3, 1]
y_pred = [
    [0.3377, 0.4867, 0.8842, 0.0854, 0.2147],
    [0.4853, 0.0468, 0.6769, 0.5482, 0.1570],
    [0.0976, 0.9899, 0.6903, 0.0828, 0.0647]
]

scce3 = tf.keras.losses.SparseCategoricalCrossentropy(reduction=tf.keras.losses.Reduction.AUTO)
loss3 = scce3(y_true, tf.nn.softmax(y_pred)).numpy()
print(loss3)

>>> 1.5067214

from torch import nn
import torch
loss = nn.CrossEntropyLoss()
y_true = torch.Tensor([3, 3, 1]).long()
y_pred = torch.Tensor([
    [0.3377, 0.4867, 0.8842, 0.0854, 0.2147],
    [0.4853, 0.0468, 0.6769, 0.5482, 0.1570],
    [0.0976, 0.9899, 0.6903, 0.0828, 0.0647]
])
loss2 = loss(y_pred, y_true)
print(loss2)

>>> tensor(1.5067)

Using the raw inputs (logits) is usually advised due to the LogSumExp trick for numerical stability. If you are using Tensorflow, I'd suggest using the tf.nn.softmax_cross_entropy_with_logits function instead, or its sparse counterpart. Edit: The SparseCategoricalCrossentropy class also has a keyword argument from_logits=False that can be set to True to the same effect.