Search code examples
pythontensorflowkerasloss-function

Keras backend function seems to be working incorrectly


I am trying to implement a custom loss function in Keras.

To start it off, I wanted to be sure the previous loss function can be called from my custom function. And this is where the weird stuff begins:

model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=['accuracy'])

works as expected.

Now the implementation of "sparse_categorical_crossentropy" in keras.losses is as follows:

def sparse_categorical_crossentropy(y_true, y_pred):
    return K.sparse_categorical_crossentropy(y_true, y_pred)

I concluded that passing K.sparse_categorical_crossentropy directly should also work. However, it throws expected activation_6 to have shape (4,) but got array with shape (1,).

Also, defining a custom loss function like this:

def custom_loss(y_true, y_pred):
    return keras.losses.sparse_categorical_crossentropy(y_true, y_pred)

does not work. During training is reduces the loss (which seems correct) but the accuracy does not improve (but it does, when using the non-custom loss function)

I am not sure what is happening, neither do I know how to debug it properly. Any help would be highly appreciated.


Solution

  • I tested what you are saying on my code and yes, you are right. I was initially getting the same error as you were getting, but once I changed the metrics parameter from accuracy to sparse_categorical_accuracy, I started getting higher accuracy.

    Here, one important thing to note is when we tell keras to use accuracy as metrics, keras uses the default accuracy which is categorical_accuracy. So, if we want to implement our own custom loss function, then we have to set metrics parameter accordingly.

    Read about available metrics function in keras from here.

    Case 1:

    def sparse_categorical_crossentropy(y_true, y_pred):
        return K.sparse_categorical_crossentropy(y_true, y_pred)
    
    model.compile(optimizer='adam',
                  loss=sparse_categorical_crossentropy,
                  metrics=['accuracy'])
    

    output:

    ValueError: Error when checking target: expected dense_71 to have shape (10,) but got array with shape (1,)

    Case 2:

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    

    output:

    Epoch 1/2
    60000/60000 [==============================] - 2s 38us/step - loss: 0.4714 - acc: 0.8668
    Epoch 2/2
    60000/60000 [==============================] - 1s 22us/step - loss: 0.2227 - acc: 0.9362
    10000/10000 [==============================] - 1s 94us/step
    

    Case 3:

    def custom_sparse_categorical_crossentropy(y_true, y_pred):
        return K.sparse_categorical_crossentropy(y_true, y_pred)
    
    model.compile(optimizer='adam',             
                  loss=custom_sparse_categorical_crossentropy, 
                  metrics=['accuracy'])
    

    output:

    Epoch 1/2
    60000/60000 [==============================] - 2s 41us/step - loss: 0.4558 - acc: 0.1042
    Epoch 2/2
    60000/60000 [==============================] - 1s 22us/step - loss: 0.2164 - acc: 0.0997
    10000/10000 [==============================] - 1s 89us/step
    

    Case 4:

    def custom_sparse_categorical_crossentropy(y_true, y_pred):
        return K.sparse_categorical_crossentropy(y_true, y_pred)
    
    model.compile(optimizer='adam',
                  loss=custom_sparse_categorical_crossentropy,
                  metrics=['sparse_categorical_accuracy'])
    

    output:

    Epoch 1/2
    60000/60000 [==============================] - 2s 40us/step - loss: 0.4736 - sparse_categorical_accuracy: 0.8673
    Epoch 2/2
    60000/60000 [==============================] - 1s 23us/step - loss: 0.2222 - sparse_categorical_accuracy: 0.9372
    10000/10000 [==============================] - 1s 85us/step
    

    Full Code:

    from __future__ import absolute_import, division, print_function
    import tensorflow as tf
    import keras.backend as K
    
    
    mnist = tf.keras.datasets.mnist
    
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    model = tf.keras.models.Sequential([
    
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(100, activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.10),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])
    
    def custom_sparse_categorical_crossentropy(y_true, y_pred):
        return K.sparse_categorical_crossentropy(y_true, y_pred)
    
    #def sparse_categorical_accuracy(y_true, y_pred):
    #    # reshape in case it's in shape (num_samples, 1) instead of (num_samples,)
    #    if K.ndim(y_true) == K.ndim(y_pred):
    #        y_true = K.squeeze(y_true, -1)
    #    # convert dense predictions to labels
    #    y_pred_labels = K.argmax(y_pred, axis=-1)
    #    y_pred_labels = K.cast(y_pred_labels, K.floatx())
    #    return K.cast(K.equal(y_true, y_pred_labels), K.floatx())
    
    model.compile(optimizer='adam',
              loss=custom_sparse_categorical_crossentropy,
             metrics=['sparse_categorical_accuracy'])
    
    history = model.fit(x_train, y_train, epochs=2, batch_size=200)
    model.evaluate(x_test, y_test)
    

    Check out the implementation of sparse_categorical_accuracy from here and sparse_categorical_crossentropy from here.