Search code examples
tensorflowmachine-learningpytorchloss-functioncross-entropy

How to calculate correct Cross Entropy between 2 tensors in Pytorch when target is not one-hot?


I am confused about the calculation of cross entropy in Pytorch. If I want to calculate the cross entropy between 2 tensors and the target tensor is not a one-hot label, which loss should I use? It is quite common to calculate the cross entropy between 2 probability distributions instead of the predicted result and a determined one-hot label.

The basic loss function CrossEntropyLoss forces the target as the index integer and it is not eligible in this case. BCELoss seems to work but it gives an unexpected result. The expected formula to calculate the cross entropy is

enter image description here

But BCELoss calculates the BCE of each dimension, which is expressed as

-yi*log(pi)-(1-yi)*log(1-pi)

Compared with the first equation, the term -(1-yi)*log(1-pi) should not be involved. Here is an example using BCELoss and we can see the second term is involved in each dimension's result. And that make the result different from the correct one.

import torch.nn as nn
import torch
from math import log

a = torch.Tensor([0.1,0.2,0.7])
y = torch.Tensor([0.2,0.2,0.6])
L = nn.BCELoss(reduction='none')
y1 = -0.2 * log(0.1) - 0.8 * log(0.9)
print(L(a, y))
print(y1)

And the result is

tensor([0.5448, 0.5004, 0.6956])
0.5448054311250702

If we sum the results of all the dimensions, the final cross entropy doesn't correspond to the expected one. Because each one of these dimensions involves the -(1-yi)*log(1-pi) term. In constrast, Tensorflow can calculate the correct cross entropy value with CategoricalCrossentropy. Here is the example with the same setting and we can see the cross entropy is calculated in the same way as the first formula.

import tensorflow as tf
from math import log
L = tf.losses.CategoricalCrossentropy()
a = tf.convert_to_tensor([0.1,0.2,0.7])
y = tf.convert_to_tensor([0.2,0.2,0.6])
y_ = -0.2* log(0.1) - 0.2 * log(0.2) - 0.6 * log(0.7)

print(L(y,a), y_)
tf.Tensor(0.9964096, shape=(), dtype=float32) 0.9964095674488687

Is there any function can calculate the correct cross entropy in Pytorch, using the first formula, just like CategoricalCrossentropy in Tensorflow?


Solution

  • The fundamental problem is that you are incorrectly using the BCELoss function.

    Cross-entropy loss is what you want. It is used to compute the loss between two arbitrary probability distributions. Indeed, its definition is exactly the equation that you provided:

    enter image description here

    where p is the target distribution and q is your predicted distribution. See this StackOverflow post for more information.

    In your example where you provide the line

    y = tf.convert_to_tensor([0.2, 0.2, 0.6])
    

    you are implicitly modeling a multi-class classification problem where the target class can be one of three classes (the length of that tensor). More specifically, that line is saying that for this one data instance, class 0 has probably 0.2, class 1 has probability 0.2, and class 2 has probability 0.6.

    The problem you are having is that PyTorch's BCELoss computes the binary cross-entropy loss, which is formulated differently. Binary cross-entropy loss computes the cross-entropy for classification problems where the target class can be only 0 or 1.

    In binary cross-entropy, you only need one probability, e.g. 0.2, meaning that the probability of the instance being class 1 is 0.2. Correspondingly, class 0 has probability 0.8.

    If you give the same tensor [0.2, 0.2, 0.6] to BCELoss, you are modeling a situation where there are three data instances, where data instance 0 has probability 0.2 of being class 1, data instance 1 has probability 0.2 of being class 1, and data instance 2 has probability 0.6 of being class 1.

    Now, to your original question:

    If I want to calculate the cross entropy between 2 tensors and the target tensor is not a one-hot label, which loss should I use?

    Unfortunately, PyTorch does not have a cross-entropy function that takes in two probability distributions. See this question: https://discuss.pytorch.org/t/how-should-i-implement-cross-entropy-loss-with-continuous-target-outputs/10720

    The recommendation is to implement your own function using its equation definition. Here is code that works:

    def cross_entropy(input, target):
        return torch.mean(-torch.sum(target * torch.log(input), 1))
    
    
    y = torch.Tensor([[0.2, 0.2, 0.6]])
    yhat = torch.Tensor([[0.1, 0.2, 0.7]])
    cross_entropy(yhat, y)
    # tensor(0.9964)
    

    It provides the answer that you wanted.