How should I reshape my data to feed into pytorch GRU network?

I've been having problems getting my data to fit the dimensions required by pytorch GRU.

My input is a 256-long float vector, in bathes of 64, so the size of a batch tensor is [64, 256]
According to pytorch documentation, GRU takes input of size [batch_size, sequence_length, input_size]. Now i'm not sure if the sequence_length corresponds to the length of the output sequence, nor am I sure what the input_size would be here (256?).
My GRU is supposed to take the whole vector as an input, generate output, and pass the output to the next gru cell as input. This is ought to continue until a sequence of 128 outputs is generated. My idea for the GRU network (see the picture)
Each of the outputs will be passed through 256 -> 42 fc layer and a token from the alphabet of 42 will be chosen.

What this network is going to do is take a 256-long encoded vector representation of a molecule and learn to generate the corresponding SELFIES string (text-based molecule representation), padded to the length of 128, with tokens from an alphabet of 42 'letters'.

Now, i have no idea how to reshape the input tensor for the GRU to accept it as an input, according to the drawing I attached.

Thanks in advance for your help.

I tried to torch.unsqueeze(1) the input tensor. This resulted in me getting an output of shape [64, 1, 256] which would be a batch of 64 one-token outputs in my model.

class DecoderNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, output_len):
        super(DecoderNet, self).__init__()
        
        # GRU parameters
        self.input_size = input_size # = 256
        self.hidden_size = hidden_size # = 256
        self.num_layers = num_layers # = 1
        
        # output token count
        self.output_size = output_size # = 42
        
        # output length or GRU time steps count
        self.output_len = output_len # = 128
        
        # pytorch.nn
        self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
        self.fc = nn.Linear(hidden_size, output_size)
        self.softmax = nn.Softmax(dim=2)
        self.relu = nn.ReLU()

    def forward(self, x, h):
        out, h = self.gru(x, h)
        return out, h
    
    def init_hidden(self, batch_size):
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size)
        return h0

Solution

By default, nn.GRU expects (seq_len, batch_size, input_size) as input. You need to create the layer with batch_first=True to give it (batch_size, seq_len, input_size).

If your x has a shape of (batch_size, seq_len), then you first need to add the inputs size dimensions with

x = x.unsqueeze(2)

to get a shape of (batch_size, seq_len, input_size=1).

Alternatively, you can keep batch_first=False (the default) and swap the batch size and sequence length dimension, before or after the unsqueeze() like that:

x = x.transpose(1, 0)

Important: Do not use reshape() or view() to "fix" the shape of x (as indicated by the title question of your post), as this will mess up your tensor!