Search code examples
python-3.xtensorflowkerasdeep-learningtensorflow2.0

Binary crossentropy and multi-label classification with from_logits set to True do not work as desired


I want to have a multi-label classification task, and I want to implement the network using Tensorflow. I know that I have to use BCE as the loss and Sigmoid as the activation of the last layer. I wanted to have the stable computation, so I decided to utilise the option from_logits set to True for BCE. The problem was that I tried to use the following code to check everything is ok, but the results do not match. In the following code, a corresponds to two samples. In the first one, the last two categories are present. b corresponds to the output of a network which is not passed through the Sigmoid activation function.

a = np.array([[0, 0, 1, 1], 
          [0, 1, 0, 1.]]) 
b = np.array([[18, -20, 20, 20], 
              [-18, 0., -10, 12]])     
b_sigmoid = tf.sigmoid(b)     
lf = tf.keras.losses.BinaryCrossentropy()
lt = tf.keras.losses.BinaryCrossentropy(from_logits=True)  

calling lf(a, b_sigmoid) yields <tf.Tensor: shape=(), dtype=float64, numpy=2.0147683492886337>, whilst calling lt(a, b) yields <tf.Tensor: shape=(), dtype=float64, numpy=2.336649845037007>. As you see, they do not match. Can someone tell me why? They have to yield the same or at least a similar number with a small difference.


Solution

  • The issue is due to the epsilon constant used as a fuzz factor by Keras to avoid numerical instability and NaN values (like the result of log(-1.)).

    This constant is used in the calculation of the BinaryCrossentropy when from_logits is set to False. By default, this epsilon value is set to 1e-7, but this induce some imprecision in the binary cross entropy calculation. That's why the recommended usage is to use from_logits=True, where the method to calculate the cross entropy does not rely on such a constant.

    If you want the two results to be closer to each other, you can use tf.keras.backend.set_epsilon to set the fuzz factor to a lower value. Using a value around 1e-15 yields closer results for the two methods. (But could lead to numerical instabilities elsewhere).

    tf.keras.backend.set_epsilon(1e-15)
    a = np.array([[0, 0, 1, 1], 
              [0, 1, 0, 1.]]) 
    b = np.array([[18, -20, 20, 20], 
                  [-18, 0., -10, 12]])     
    b_sigmoid = tf.sigmoid(b)     
    lf = tf.keras.losses.BinaryCrossentropy()
    lt = tf.keras.losses.BinaryCrossentropy(from_logits=True)  
    

    Testing it:

    >>> lt(a,b)
    <tf.Tensor: shape=(), dtype=float64, numpy=2.336649845037007>
    >>> lf(a, b_sigmoid)
    <tf.Tensor: shape=(), dtype=float64, numpy=2.3366498369361732>