How to compute Cross Entropy Loss for sequences

I have a sequence continuation/prediction task (input: a sequence of class indices, output: a sequence of class indices) and I use Pytorch.

My neural network returns a tensor of shape (batch_size, sequence_length, numb_classes) where the entries are a number proportional to the propability that the class with this index is the next class in the sequence. My targets in the training data are of shape (batch_size, sequence_length) (just the sequences of the real predictions).

I want to use the CrossEntropyLoss

My question: How do I use the Cross Entropy Loss function? Which input shapes are required?

Thank you!

Solution

The documentation page of nn.CrossEntropyLoss clearly states:

Input: shape (C), (N, C) or (N, C, d_1, d_2, ..., d_K) with K >= 1 in the case of K-dimensional loss.
Target: If containing class indices, shape (), (N) or (N, d_1, d_2, ..., d_K) with K >= 1 in the case of K-dimensional loss where each value should be between [0, C). If containing class probabilities, the input and each value should be between [0, 1].

Just to be crystal clear, "input" refers to the output prediction of your model while the "target" is the label tensor. In a nutshell, the target must have one less dimension than that of the input. This missing dimension in the target would contain each class logit value. Usually, we say the target is in the dense format, it only contains the class indices corresponding to the true labels.

The example you give corresponds to the use case of:

#input = (batch_size, sequence_length, numb_classes)
#target = (batch_size, sequence_length)

Which is the case of #input = (N, C, d_1) and #target = (N, d_1), i.e;, you need to permute the axes, or tranpose two axes from your input tensor such that it gets a shape of (batch_size, numb_classes, sequence_length) which is (N, C, d_1). You can do so with either torch.Tensor.transpose or torch.Tensor.permute:

>>> input.permute(0,2,1)

>>> input.transpose(1,2)