Using Pytorch LSTM architecture trying to build a text generation model. For every batch, I'm using pad_sequence to have min padding for every sequence, therefore I have a variable dims batch (batch_size * seq_len). I'm applying also pack_padded_seq to only give the non-zero (non-padding) tokens to the LSTM. But, the variable dims batch throws an error while feeding it to the LSTM as following; Expected hidden[0] size (1, 8, 16), got (1, 16, 16). In this error, I have provided batch size 16 with 8 tokens for every sequence, but the hidden state is 16 * 16.
I have tried to create the hidden state in the forward function, but that did not work well. How can I create the hidden state such that it can accept variable dims batch and it will not be lost for the whole epoche?
class RNNModule(nn.Module):
def __init__(self, n_vocab, seq_size, embedding_size, lstm_size):
super(RNNModule, self).__init__()
self.seq_size = seq_size
self.lstm_size = lstm_size
self.embedding, num_embeddings, embedding_dim = create_emb_layer(weight_matrix, False)
self.lstm = nn.LSTM(embedding_size,
lstm_size,
num_layers=flags.n_layers,
batch_first=True
)
self.dense = nn.Linear(lstm_size, n_vocab)
def forward(self, x,length,prev_state):
embed = self.embedding(x)
packed_input = db.pack_src(embed,length)
packed_output, state = self.lstm(packed_input,prev_state)
padded,_ = db.pad_pack(packed_output)
logits = self.dense(padded)
return logits, state
def zero_state(self, batch_size = flags.batch_size):
return (torch.zeros(flags.n_layers, batch_size, self.lstm_size),
torch.zeros(flags.n_layers, batch_size, self.lstm_size))
input: tensor([[ 19, 9, 4, 3, 68, 8, 6, 2],
[ 19, 9, 4, 3, 7, 8, 6, 2],
[ 3, 12, 17, 10, 6, 40, 2, 0],
[ 4, 3, 109, 7, 6, 2, 0, 0],
[ 188, 6, 7, 18, 3, 2, 0, 0],
[ 4, 3, 12, 6, 7, 2, 0, 0],
[ 6, 7, 3, 13, 2, 0, 0, 0],
[ 3, 28, 17, 69, 2, 0, 0, 0],
[ 6, 3, 12, 11, 2, 0, 0, 0],
[ 3, 13, 6, 7, 2, 0, 0, 0],
[ 3, 6, 7, 13, 2, 0, 0, 0],
[ 6, 3, 23, 7, 2, 0, 0, 0],
[ 3, 28, 10, 2, 0, 0, 0, 0],
[ 6, 3, 23, 2, 0, 0, 0, 0],
[ 3, 6, 37, 2, 0, 0, 0, 0],
[1218, 2, 0, 0, 0, 0, 0, 0]])
Zero tokens are padding. Embedding size: 64 LSTM size: 16 batch size: 16
The size of the hidden state you create has the correct size, but your input does not. When you pack it with nn.utils.rnn.pack_padded_sequence
you've set batch_first=False
, but your data has size [batch_size, seq_len, embedding_size] when you pass it to the packing, so that has batch_size as the first dimension. Also for the LSTM you use batch_first=True
, which is appropriate for your data.
You only need to pack it correctly by setting batch_first=True
as well, to match the order of your data.
rnn_utils.pack_padded_sequence(embed,length,batch_first=True)