To handle variable length input sequence, all the input sequences are padded to same length. This effects in calculating the value of loss. So a mask tensor is multiplied with the loss tensor to make the loss generated by padded elements 0. But in taking the mean of the loss using tf.math.reduce_mean or tf.keras.metrics.Mean, those padded elements has effects on the mean.
So my question is, How to take a mean of the masked loss in tensorflow?
For example:
t = [1, 2, 3]
t = pad(t, 6) # padding, now t = [1, 2, 3, 0, 0, 0]
mask = [True, True, True, False, False, False]
loss = [0.1, 0.2, 0.3, 0.12, 0.2, 0.4] # notice padded elements contribute to loss
loss = loss * mask # loss = [0.1, 0.2, 0.3, 0, 0, 0]
Now I want something like:
Mean(loss) = 0.6, which is (0.1 + 0.2 + 0.3) / 3
not something like:
Mean(loss) = 0.1, which is (0.1 + 0.2 + 0.3 + 0 + 0 + 0)/6
Refer to this tensorflow google group
Divide the reduce sum of the tensor by the reduce sum of the mask.
mean = tf.math.reduce_sum(t) / tf.math.reduce_sum(mask)