python-3.x tensorflow keras deep-learning tensorflow2.0

Binary crossentropy and multi-label classification with from_logits set to True do not work as desired

I want to have a multi-label classification task, and I want to implement the network using Tensorflow. I know that I have to use BCE as the loss and Sigmoid as the activation of the last layer. I wanted to have the stable computation, so I decided to utilise the option from_logits set to True for BCE. The problem was that I tried to use the following code to check everything is ok, but the results do not match. In the following code, a corresponds to two samples. In the first one, the last two categories are present. b corresponds to the output of a network which is not passed through the Sigmoid activation function.

a = np.array([[0, 0, 1, 1], 
          [0, 1, 0, 1.]]) 
b = np.array([[18, -20, 20, 20], 
              [-18, 0., -10, 12]])     
b_sigmoid = tf.sigmoid(b)     
lf = tf.keras.losses.BinaryCrossentropy()
lt = tf.keras.losses.BinaryCrossentropy(from_logits=True)

calling lf(a, b_sigmoid) yields <tf.Tensor: shape=(), dtype=float64, numpy=2.0147683492886337>, whilst calling lt(a, b) yields <tf.Tensor: shape=(), dtype=float64, numpy=2.336649845037007>. As you see, they do not match. Can someone tell me why? They have to yield the same or at least a similar number with a small difference.

Solution

The issue is due to the epsilon constant used as a fuzz factor by Keras to avoid numerical instability and NaN values (like the result of log(-1.)).

This constant is used in the calculation of the BinaryCrossentropy when from_logits is set to False. By default, this epsilon value is set to 1e-7, but this induce some imprecision in the binary cross entropy calculation. That's why the recommended usage is to use from_logits=True, where the method to calculate the cross entropy does not rely on such a constant.

If you want the two results to be closer to each other, you can use tf.keras.backend.set_epsilon to set the fuzz factor to a lower value. Using a value around 1e-15 yields closer results for the two methods. (But could lead to numerical instabilities elsewhere).

tf.keras.backend.set_epsilon(1e-15)
a = np.array([[0, 0, 1, 1], 
          [0, 1, 0, 1.]]) 
b = np.array([[18, -20, 20, 20], 
              [-18, 0., -10, 12]])     
b_sigmoid = tf.sigmoid(b)     
lf = tf.keras.losses.BinaryCrossentropy()
lt = tf.keras.losses.BinaryCrossentropy(from_logits=True)

Testing it:

>>> lt(a,b)
<tf.Tensor: shape=(), dtype=float64, numpy=2.336649845037007>
>>> lf(a, b_sigmoid)
<tf.Tensor: shape=(), dtype=float64, numpy=2.3366498369361732>