Search code examples
pythontensorflowkerasdeep-learningloss-function

tf.keras.losses.categorical_crossentropy() does not output what it should output


I am trying to train a classifier CNN with 3 classes. I am trying to troubleshoot my loss function. I am testing tf.keras.losses.CategoricalCrossentropy() and tf.keras.losses.categorical_crossentropy().numpy(). I am following the standealone usage guide from the tensorflow documentation.

I think that I am not getting the proper outputs that I should be. When I input y_true=[0.,1.,0.] and y_pred=[1.,0.,0.] I expect a loss of infinity (output in the program: nan). However, the output I receive is 16.118095. When the classification aligns with the label (i.e. y_true=[1.,0.,0.] and y_pred=[1.,0.,0.]) the output is 1.192093e-07, even though I would expect a perfect 0.

I am really perplexed by this behavior. Similarly, with the 1 long vector case: y_true=[1.] and y_pred=[0.] the loss is 16.118095, and likewise when the classification aligns y_true=[1.] and y_pred=[1.] I receive 1.192093e-07 and y_true=[0.] and y_pred=[0.] the result is nan.

I think that summarizing the results I get, the results I expect, and the values I am inputting into the loss functions would make things more readable so I will do that below:

y_true y_pred Actual Output What I Expect
[0.,1.,0.] [1.,0.,0.] 16.118095 nan or infinity
[1.,0.,0.] [1.,0.,0.] 1.192093e-07 True 0
[0.,1.] [1.,0.] 16.118095 nan or infinity
[1.,0.] [1.,0.] 1.192093e-07 True 0
[1.] [0] nan or infinity nan or infinity
[1.] [1.] 1.192093e-07 True 0

I am sorry if this is a trivial question, but I really don't know why I am getting the results that I am getting. I think something is wrong because I am only getting 16 and not infinity, but if nothing is going wrong I'd like the reassurance. If I am wrong, I would really appreciate the correction.


Solution

  • The reason is that tf.keras.losses.categorical_crossentropy applies a small offset (1e-7) to y_pred when it's equal to one or zero, that's why in your case you don't see the output that you expect.

    import tensorflow as tf
    
    def categorical_crossentropy(y_true, y_pred, clip=False):
        if clip == True:
            y_pred = tf.clip_by_value(y_pred, 1e-7, 1 - 1e-7)
        return - tf.experimental.numpy.nansum(y_true * tf.math.log(y_pred))
    
    y_true = [0., 1., 0.]
    y_pred = [1., 0., 0.]
    
    print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
    # 16.118095
    
    print(categorical_crossentropy(y_true, y_pred, clip=True).numpy())
    # 16.118095
    
    print(categorical_crossentropy(y_true, y_pred, clip=False).numpy())
    # inf
    
    y_true = [1., 0., 0.]
    y_pred = [1., 0., 0.]
    
    print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
    # 1.1920929e-07
    
    print(categorical_crossentropy(y_true, y_pred, clip=True).numpy())
    # 1.1920929e-07
    
    print(categorical_crossentropy(y_true, y_pred, clip=False).numpy())
    # -0.0
    
    y_true = [0., 1., 0.]
    y_pred = [0.05, 0.95, 0.]
    
    print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
    # 0.051293306
    
    print(categorical_crossentropy(y_true, y_pred, clip=True).numpy())
    # 0.051293306
    
    print(categorical_crossentropy(y_true, y_pred, clip=False).numpy())
    # 0.051293306