I want to have a multi-label classification task, and I want to implement the network using Tensorflow
. I know that I have to use BCE
as the loss and Sigmoid
as the activation of the last layer. I wanted to have the stable computation, so I decided to utilise the option from_logits
set to True
for BCE
. The problem was that I tried to use the following code to check everything is ok, but the results do not match. In the following code, a
corresponds to two samples. In the first one, the last two categories are present. b
corresponds to the output of a network which is not passed through the Sigmoid
activation function.
a = np.array([[0, 0, 1, 1],
[0, 1, 0, 1.]])
b = np.array([[18, -20, 20, 20],
[-18, 0., -10, 12]])
b_sigmoid = tf.sigmoid(b)
lf = tf.keras.losses.BinaryCrossentropy()
lt = tf.keras.losses.BinaryCrossentropy(from_logits=True)
calling lf(a, b_sigmoid)
yields <tf.Tensor: shape=(), dtype=float64, numpy=2.0147683492886337>
, whilst calling lt(a, b)
yields <tf.Tensor: shape=(), dtype=float64, numpy=2.336649845037007>
. As you see, they do not match. Can someone tell me why? They have to yield the same or at least a similar number with a small difference.
The issue is due to the epsilon constant used as a fuzz factor by Keras to avoid numerical instability and NaN values (like the result of log(-1.)
).
This constant is used in the calculation of the BinaryCrossentropy when from_logits
is set to False
. By default, this epsilon value is set to 1e-7
, but this induce some imprecision in the binary cross entropy calculation. That's why the recommended usage is to use from_logits=True
, where the method to calculate the cross entropy does not rely on such a constant.
If you want the two results to be closer to each other, you can use tf.keras.backend.set_epsilon
to set the fuzz factor to a lower value. Using a value around 1e-15
yields closer results for the two methods. (But could lead to numerical instabilities elsewhere).
tf.keras.backend.set_epsilon(1e-15)
a = np.array([[0, 0, 1, 1],
[0, 1, 0, 1.]])
b = np.array([[18, -20, 20, 20],
[-18, 0., -10, 12]])
b_sigmoid = tf.sigmoid(b)
lf = tf.keras.losses.BinaryCrossentropy()
lt = tf.keras.losses.BinaryCrossentropy(from_logits=True)
Testing it:
>>> lt(a,b)
<tf.Tensor: shape=(), dtype=float64, numpy=2.336649845037007>
>>> lf(a, b_sigmoid)
<tf.Tensor: shape=(), dtype=float64, numpy=2.3366498369361732>