Search code examples
pythonrecurrent-neural-networkchainer

How to convert padded sequence tensor to expected RNN format?


I have a tensor of shape (batch_size, max_sequence_length, embedding_size) that is padded to maximum length to store sequences. I also have (batch_size, max_sequence_length, vocab), for example:

# Batch size many, (batch_size, 4, 8)
[2,4,1,4]
[7,4,2,0]
[6,0,0,0]
# Using EmbedID(ignore_label=0) to get (batch_size, 4, embeddeding_size)

How can we pass this to for example a NStepGRU link in Chainer? and for example obtain the final hidden state of all the sequences (batch_size, embedding_size)?


Solution

  • NStepGRU accepts a batch of sequences as a list whose element is of shape (sequence_length, embedding_size). Note that padding is not needed here; each element can have a different length.

    If you have a tensor x of shape (batch_size, max_sequence_length, embedding_size) and the lengths of the sequences lengths, you can pass [x[i, :l] for i, l in enumerate(lengths)] to NStepGRU.

    NStepGRU returns ys the output of the last layer and hs the final hidden state. Since NStepGRU may contain multiple layers, the final hidden state is provided for each layer; i.e., ys has the shape (num_layers, batch_size, embedding_size). If you are using single layer NStepGRU, just extracting hs[0] returns the final hidden state of shape (batch_size, embedding_size).