Search code examples
tensorflowneural-networkrecurrent-neural-networkseqlanguage-model

TensorFlow - predicting next word - loss function logit na target shape


I'm trying to create a language model. I have logit and target of size: [32, 312, 512]

Where:

  • .shape[0] is batch_size
  • .shape[1] is sequence_max_len
  • .shape[2] is vocabulary size

The question is - when I pass logit and target to the loss function as follows:

self.loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(
                                          logits=self.logit, labels=self.y))

Does it compute appropriate loss for the current batch? Or should I reshape logit and target to express the following shape: [32, 312*512]?

Thanks in advance for your help!


Solution

  • The answer is: it's irrelevant, since tf.nn.softmax_cross_entropy_with_logits() have dim argument:

    dim: The class dimension. Defaulted to -1 which is the last dimension.
    name: A name for the operation (optional).
    

    Also inside tf.nn.softmax_cross_entropy_with_logits() you have this code:

    # Make precise_logits and labels into matrices.
    precise_logits = _flatten_outer_dims(precise_logits)
    labels = _flatten_outer_dims(labels)