Search code examples
deep-learningpytorchnlprecurrent-neural-networkone-hot-encoding

Why am I getting Hidden size error with PyTorch RNN


I am trying to build a RNN for next word prediction, following a next character prediction example (tutorial, github, colab (runtime ~1min)).

In the example, the input shape is (3,14,17) for batch_size, sequence_length and nb_features. Then the hidden size is defined as (1,3,12) for n_layers, batch_size and hidden_dim.

I followed this example except for my batch_size is 1. Also, my input sequences are not padded since I'm using batch_size 1. So I run my train() method and I get an error on my first training data example :
RuntimeError: Expected hidden size (1, 25, 12), got [1, 1, 12] (25 being the sequence length).

So it seems pytorch ask me to give sequence length as dimension for my hidden layer but in the example code I followed it isn't the case and the code works fine.

What am I doing wrong?

Additionally, here is the colab I am using (runtime ~1min).


Solution

  • There were 2 main differences between the example code I use and my code which couldn't compute :
    1- batch_first=True passed to the RNN when initiating the model
    2- the target preprocessing had to differ from the input preprocessing : I am using sparse one-hot vectors to encode words and while sparse vectors work in input, the target had to be encoded with only the index of the word in the one-hot instead of the whole one-hot vector

    Thanks @erip for help debugging this!