I am trying to understand the TensorFlow implementation of Image captioning with visual attention. I understand what SparseCategoricalCrossentropy is but what is loss_function
doing? Can someone explain? Tensorflow Implementation
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction='none')
def loss_function(real, pred):
mask = tf.math.logical_not(tf.math.equal(real, 0))
loss_ = loss_object(real, pred)
mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask
return tf.reduce_mean(loss_)
We need to go back to what is in real
. In real
we have words encoded as number
with tf.keras.preprocessing.text.Tokenizer
. In the tutorial, the value 0 is for the <pad>
token.
tokenizer.word_index['<pad>'] = 0
So, the loss function simply apply a mask to discard the predictions made on the <pad>
tokens, because they don't provide meaningful information for the training of the network.