Custom Keras binary_crossentropy loss function not working

I’m trying to re-define keras’s binary_crossentropy loss function so that I can customize it but it’s not giving me the same results as the existing one.

I'm using TF 1.13.1 with Keras 2.2.4.

I went through Keras’s github code. My understanding is that the loss in model.compile(optimizer='adam', loss='binary_crossentropy', metrics =['accuracy']), is defined in losses.py, using binary_crossentropy defined in tensorflow_backend.py.

I ran a dummy data and model to test it. Here are my findings:

The custom loss function outputs the same results as keras’s one
Using the custom loss in a keras model gives different accuracy results

from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)

import tensorflow as tf
from keras import losses
import keras.backend as K
import keras.backend.tensorflow_backend as tfb
from keras.layers import Dense
from keras import Sequential

#Dummy check of loss output
def binary_crossentropy_custom(y_true, y_pred):
    return K.mean(binary_crossentropy_custom_tf(y_true, y_pred), axis=-1)

def binary_crossentropy_custom_tf(target, output, from_logits=False):
    """Binary crossentropy between an output tensor and a target tensor.

    # Arguments
        target: A tensor with the same shape as `output`.
        output: A tensor.
        from_logits: Whether `output` is expected to be a logits tensor.
            By default, we consider that `output`
            encodes a probability distribution.

    # Returns
        A tensor.
    """
    # Note: tf.nn.sigmoid_cross_entropy_with_logits
    # expects logits, Keras expects probabilities.
    if not from_logits:
        # transform back to logits
        _epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
        output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
        output = tf.log(output / (1 - output))

    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)

logits = tf.constant([[-3., -2.11, -1.22],
                     [-0.33, 0.55, 1.44],
                     [2.33, 3.22, 4.11]])

labels = tf.constant([[1., 1., 1.], 
                      [1., 1., 0.], 
                      [0., 0., 0.]])

custom_sigmoid_cross_entropy_with_logits = binary_crossentropy_custom(labels, logits)
keras_binary_crossentropy = losses.binary_crossentropy(y_true=labels, y_pred=logits)

with tf.Session() as sess:
    print('CUSTOM sigmoid_cross_entropy_with_logits: ', sess.run(custom_sigmoid_cross_entropy_with_logits), '\n')
    print('KERAS keras_binary_crossentropy: ', sess.run(keras_binary_crossentropy), '\n')

#CUSTOM sigmoid_cross_entropy_with_logits:  [16.118095 10.886106 15.942386] 

#KERAS keras_binary_crossentropy:  [16.118095 10.886106 15.942386] 

#Dummy check of model accuracy

X_train = tf.random.uniform((3, 5), minval=0, maxval=1, dtype=tf.dtypes.float32)
labels = tf.constant([[1., 0., 0.], 
                      [0., 0., 1.], 
                      [1., 0., 0.]])

model = Sequential()
#First Hidden Layer
model.add(Dense(5, activation='relu', kernel_initializer='random_normal', input_dim=5))
#Output Layer
model.add(Dense(3, activation='sigmoid', kernel_initializer='random_normal'))

#I ran model.fit for each model.compile below 10 times using the same X_train and provide the range of accuracy measurement
# model.compile(optimizer='adam', loss='binary_crossentropy', metrics =['accuracy']) #0.748 < acc < 0.779
# model.compile(optimizer='adam', loss=losses.binary_crossentropy, metrics =['accuracy']) #0.761 < acc < 0.778
model.compile(optimizer='adam', loss=binary_crossentropy_custom, metrics =['accuracy']) #0.617 < acc < 0.663

history = model.fit(X_train, labels, steps_per_epoch=100, epochs=1)

I'd expect the custom loss function to give similar model accuracy output but it does not. Any idea? Thanks!

Solution

Keras automatically selects which accuracy implementation to use according to the loss, and this won't work if you use a custom loss. But in this case you can just explictly use the right accuracy, which is binary_accuracy:

model.compile(optimizer='adam', loss=binary_crossentropy_custom, metrics =['binary_accuracy'])