I am trying to build a RNN for next word prediction, following a next character prediction example (tutorial, github, colab (runtime ~1min)).
In the example, the input shape is (3,14,17) for batch_size, sequence_length and nb_features. Then the hidden size is defined as (1,3,12) for n_layers, batch_size and hidden_dim.
I followed this example except for my batch_size is 1. Also, my input sequences are not padded since I'm using batch_size 1. So I run my train() method and I get an error on my first training data example :
RuntimeError: Expected hidden size (1, 25, 12), got [1, 1, 12]
(25 being the sequence length).
So it seems pytorch ask me to give sequence length as dimension for my hidden layer but in the example code I followed it isn't the case and the code works fine.
What am I doing wrong?
Additionally, here is the colab I am using (runtime ~1min).
There were 2 main differences between the example code I use and my code which couldn't compute :
1- batch_first=True
passed to the RNN when initiating the model
2- the target preprocessing had to differ from the input preprocessing : I am using sparse one-hot vectors to encode words and while sparse vectors work in input, the target had to be encoded with only the index of the word in the one-hot instead of the whole one-hot vector
Thanks @erip for help debugging this!