I'm trying to build a text classifier network using LSTM. The error I'm getting is:
RuntimeError: Expected hidden[0] size (4, 600, 256), got (4, 64, 256)
The data is json and looks like this:
{"cat": "music", "desc": "I'm in love with the song's intro!", "sent": "h"}
I'm using torchtext
to load the data.
from torchtext import data
from torchtext import datasets
TEXT = data.Field(fix_length = 600)
LABEL = data.Field(fix_length = 10)
BATCH_SIZE = 64
fields = {
'cat': ('c', LABEL),
'desc': ('d', TEXT),
'sent': ('s', LABEL),
}
My LSTM looks like this
EMBEDDING_DIM = 64
HIDDEN_DIM = 256
N_LAYERS = 4
MyLSTM(
(embedding): Embedding(11967, 64)
(lstm): LSTM(64, 256, num_layers=4, batch_first=True, dropout=0.5)
(dropout): Dropout(p=0.3, inplace=False)
(fc): Linear(in_features=256, out_features=8, bias=True)
(sig): Sigmoid()
)
I end up with the following dimensions for the inputs
and labels
batch = list(train_iterator)[0]
inputs, labels = batch
print(inputs.shape) # torch.Size([600, 64])
print(labels.shape) # torch.Size([100, 2, 64])
And my initialized hidden tensor looks like:
hidden # [torch.Size([4, 64, 256]), torch.Size([4, 64, 256])]
I'm trying to understand what the dimensions at each step should be. Should the hidden dimension be initialized to (4, 600, 256) or (4, 64, 256)?
The documentation of nn.LSTM
- Inputs explains what the dimensions are:
- h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1.
Therefore, your hidden state should have size (4, 64, 256), so you did that correctly. On the other hand, you are not providing the correct size for the input.
- input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See
torch.nn.utils.rnn.pack_padded_sequence()
ortorch.nn.utils.rnn.pack_sequence()
for details.
While it says that the size of the input needs to be (seq_len, batch, input_size), you've set batch_first=True
in your LSTM, which swaps batch and seq_len. Therefore your input should have size (batch_size, seq_len, input_size), but that is not the case as your input has seq_len first (600) and batch second (64), which is the default in torchtext because that's the more common representation, which also matches the default behaviour of LSTM.
You need to set batch_first=False
in your LSTM.
Alternatively. if you prefer having batch as the first dimension in general, torch.data.Field
also has the batch_first
option.