Search code examples
pythonpytorchneural-networklstmseq2seq

Pytorch nn.LSTM: RuntimeError: For unbatched 2-D input, hx and cx should also be 2-D but got (3-D, 3-D) tensors


I am trying to code a seq2seq translator for English to Spanish, including an LSTM encoding and an LSTM decoding layer.
During the forwarding process, I get the following error:
RuntimeError: For unbatched 2-D input, hx and cx should also be 2-D but got (3-D, 3-D) tensors
Interestingly, the encoding layer seems to work in the forwarding process, it is the decoding layer throwing the error.

This is my neural net. I included several print statements to understand possible transformations and shapes of data for learning and debugging.

class Seq2SeqModel(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, num_layers):
    super(Seq2SeqModel, self).__init__()
    self.encoder = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True)
    self.decoder = nn.LSTM(input_size=68, hidden_size=hidden_size, num_layers=num_layers, batch_first=True)
    self.fc = nn.Linear(in_features=hidden_size, out_features=output_size)

  def forward(self, x, y):
    print(f"x shape: {x.shape}")
    print(f"y shape: {y.shape}")
    print("Set Memories")
    h_0 = torch.autograd.Variable(torch.zeros(num_layers, x.size(0), hidden_size))
    c_0 = torch.autograd.Variable(torch.zeros(num_layers, x.size(0), hidden_size))
    print(f"h: {h_0.shape}")
    print(f"c: {c_0.shape}")
    print("Encoder...")
    _, (hidden, cell) = self.encoder(x, (h_0, c_0))
    print("Results:")
    print(hidden.shape, cell.shape)
    print("Decoder:")
    decoder_output, _ = self.decoder(y, (hidden, cell)), #self.decoder(y, (hidden.view(1, 32, 256), cell.view(1, 32, 256)))
    print("Results: ", decoder_output.shape)
    output = self.fc(decoder_output)
    return output

An output of the training looks like this:

x shape: torch.Size([32, 70, 100])
y shape: torch.Size([32, 68])
Set Memories
h: torch.Size([1, 32, 256])
c: torch.Size([1, 32, 256])
Encoder...
Results:
torch.Size([1, 32, 256]) torch.Size([1, 32, 256])
Decoder:

x are the English words. The batch size is 32, the maximum length including padding is 70 (all sentences 70 words long) and all of these 70 words are represented by 100 numbers forming a vector in a Word2Vec embedding.
y are the Spanish words. Again, batch size is 32 but maximum length including padding is 68. Spanish words are represented by one number (no vector embedding)

The initialization of the neural net looks like this:

model = Seq2SeqModel(input_size=100, hidden_size=256, output_size=68, num_layers=1)

My knowledge about LSTMs is quite vague, so maybe data has to be formatted differently to be fed into the neural net correctly. I experimented with different dimensions of the data before feeding it through the network but nothing worked for me.

Does anyone know what to change?

Thanks in advance!😇


Solution

  • You can't send a tensor of ints into the LSTM model. The y tensor needs to be 3D like the x tensor. You also need to iterate over the decoder inputs.

    You can check out this example of seq2seq in pytorch.