python tensorflow keras deep-learning loss-function

tf.keras.losses.categorical_crossentropy() does not output what it should output

I am trying to train a classifier CNN with 3 classes. I am trying to troubleshoot my loss function. I am testing tf.keras.losses.CategoricalCrossentropy() and tf.keras.losses.categorical_crossentropy().numpy(). I am following the standealone usage guide from the tensorflow documentation.

I think that I am not getting the proper outputs that I should be. When I input y_true=[0.,1.,0.] and y_pred=[1.,0.,0.] I expect a loss of infinity (output in the program: nan). However, the output I receive is 16.118095. When the classification aligns with the label (i.e. y_true=[1.,0.,0.] and y_pred=[1.,0.,0.]) the output is 1.192093e-07, even though I would expect a perfect 0.

I am really perplexed by this behavior. Similarly, with the 1 long vector case: y_true=[1.] and y_pred=[0.] the loss is 16.118095, and likewise when the classification aligns y_true=[1.] and y_pred=[1.] I receive 1.192093e-07 and y_true=[0.] and y_pred=[0.] the result is nan.

I think that summarizing the results I get, the results I expect, and the values I am inputting into the loss functions would make things more readable so I will do that below:

`y_true`	`y_pred`	Actual Output	What I Expect
`[0.,1.,0.]`	`[1.,0.,0.]`	16.118095	`nan` or infinity
`[1.,0.,0.]`	`[1.,0.,0.]`	1.192093e-07	True 0
`[0.,1.]`	`[1.,0.]`	16.118095	`nan` or infinity
`[1.,0.]`	`[1.,0.]`	1.192093e-07	True 0
`[1.]`	`[0]`	`nan` or infinity	`nan` or infinity
`[1.]`	`[1.]`	1.192093e-07	True 0

I am sorry if this is a trivial question, but I really don't know why I am getting the results that I am getting. I think something is wrong because I am only getting 16 and not infinity, but if nothing is going wrong I'd like the reassurance. If I am wrong, I would really appreciate the correction.

Solution

The reason is that tf.keras.losses.categorical_crossentropy applies a small offset (1e-7) to y_pred when it's equal to one or zero, that's why in your case you don't see the output that you expect.

import tensorflow as tf

def categorical_crossentropy(y_true, y_pred, clip=False):
    if clip == True:
        y_pred = tf.clip_by_value(y_pred, 1e-7, 1 - 1e-7)
    return - tf.experimental.numpy.nansum(y_true * tf.math.log(y_pred))

y_true = [0., 1., 0.]
y_pred = [1., 0., 0.]

print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
# 16.118095

print(categorical_crossentropy(y_true, y_pred, clip=True).numpy())
# 16.118095

print(categorical_crossentropy(y_true, y_pred, clip=False).numpy())
# inf

y_true = [1., 0., 0.]
y_pred = [1., 0., 0.]

print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
# 1.1920929e-07

print(categorical_crossentropy(y_true, y_pred, clip=True).numpy())
# 1.1920929e-07

print(categorical_crossentropy(y_true, y_pred, clip=False).numpy())
# -0.0

y_true = [0., 1., 0.]
y_pred = [0.05, 0.95, 0.]

print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
# 0.051293306

print(categorical_crossentropy(y_true, y_pred, clip=True).numpy())
# 0.051293306

print(categorical_crossentropy(y_true, y_pred, clip=False).numpy())
# 0.051293306