Search code examples
kerastensorflow2.0tf.kerascross-entropy

Why the result of categorical cross entropy in tensorflow different from the definition?


I am testing outcomes of tf.keras.losses.CategoricalCrossEntropy, and it gives me values different from the definition. My understanding of cross entropy is:


def ce_loss_def(y_true, y_pred):
    return tf.reduce_sum(-tf.math.multiply(y_true, tf.math.log(y_pred)))

And lets say I have values like this:

pred = [0.1, 0.1, 0.1, 0.7]
target = [0, 0, 0, 1]
pred = tf.constant(pred, dtype = tf.float32)
target = tf.constant(target, dtype = tf.float32)

pred_2 = [0.1, 0.3, 0.1, 0.7]
target = [0, 0, 0, 1]
pred_2 = tf.constant(pred_2, dtype = tf.float32)
target = tf.constant(target, dtype = tf.float32)

By the definition I think it should disregard the probabilities in the non-target classes, like this:

ce_loss_def(y_true = target, y_pred = pred), ce_loss_def(y_true = target, y_pred = pred_2)

(<tf.Tensor: shape=(), dtype=float32, numpy=0.35667497>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.35667497>)

But tf.keras.losses.CategoricalCrossEntropy doesn't give me the same results:

ce_loss_keras = tf.keras.losses.CategoricalCrossentropy()

ce_loss_keras(y_true = target, y_pred = pred), ce_loss_keras(y_true = target, y_pred = pred_2)

outputs:

(<tf.Tensor: shape=(), dtype=float32, numpy=0.35667497>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.5389965>)

What am I missing?

Here is the link to the notebook I used to get this result: https://colab.research.google.com/drive/1T69vn7MCGMSQ8hlRkyve6_EPxIZC1IKb#scrollTo=dHZruq-PGyzO


Solution

  • I found out what the problem was. The vector elements get scaled automatically somehow, to sum up to 1 because the values are probabilities.

    import tensorflow as tf
    
    ce_loss = tf.keras.losses.CategoricalCrossentropy()
    
    pred = [0.05, 0.2, 0.25, 0.5]
    target = [0, 0, 0, 1]
    pred = tf.constant(pred, dtype = tf.float32)
    target = tf.constant(target, dtype = tf.float32)
    
    pred_2 = [0.1, 0.3, 0.1, 0.5] # pred_2 has P(class2) = 0.3, instead of P(class2) = 0.1.
    target = [0, 0, 0, 1]
    pred_2 = tf.constant(pred_2, dtype = tf.float32)
    target = tf.constant(target, dtype = tf.float32)
    
    c1, c2 = ce_loss(y_true = target, y_pred = pred), ce_loss(y_true = target, y_pred = pred_2)
    print("CE loss at dafault value: {}. CE loss with different probability of non-target classes:{}".format(c1,c2))
    

    gives

    
    CE loss at default value: 0.6931471824645996. 
    CE loss with with different probability of non-target classes:0.6931471824645996
    
    

    As intended.