python machine-learning pytorch lstm recurrent-neural-network

PyTorch path generation with RNN - confusion with input, output, hidden and batch sizes

I followed a tutorial on sentence generation with RNN and I'm trying to modify it to generate sequences of positions, however I'm having trouble with defining the correct model parameters such as input_size, output_size, hidden_dim, batch_size.

Background:

I have 596 sequences of x,y positions, each looking like [[x1,y1],[x2,y2],...,[xn,yn]]. Each sequence represents the 2D path of a vehicle. I would like to to train a model that, given a starting point (or a partial sequence), could generate one of these sequences.

-I have padded/truncated the sequences so that they all have length 50, meaning each sequence is an array of shape [50,2]

-I then divided this data into input_seq and target_seq:

input_seq: tensor of torch.Size([596, 49, 2]). contains all the 596 sequences, each without its last position.

target_seq: tensor of torch.Size([596, 49, 2]). contains all the 596 sequences, each without its first position.

The model class:

class Model(nn.Module):
def __init__(self, input_size, output_size, hidden_dim, n_layers):
    super(Model, self).__init__()
    # Defining some parameters
    self.hidden_dim = hidden_dim
    self.n_layers = n_layers
    #Defining the layers
    # RNN Layer
    self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
    # Fully connected layer
    self.fc = nn.Linear(hidden_dim, output_size)

def forward(self, x):
    batch_size = x.size(0)      
    # Initializing hidden state for first input using method defined below
    hidden = self.init_hidden(batch_size)
    # Passing in the input and hidden state into the model and obtaining outputs
    out, hidden = self.rnn(x, hidden)
    # Reshaping the outputs such that it can be fit into the fully connected layer
    out = out.contiguous().view(-1, self.hidden_dim)
    out = self.fc(out)        
    return out, hidden

def init_hidden(self, batch_size):
    # This method generates the first hidden state of zeros which we'll use in the forward pass
    # We'll send the tensor holding the hidden state to the device we specified earlier as well
    hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
    return hidden

I instantiate the model with the following parameters:

input_size of 2 (an [x,y] position)

output_size of 2 (an [x,y] position)

hidden_dim of 2 (an [x,y] position) (or should this be 50 as in the length of a full sequence?)

model = Model(input_size=2, output_size=2, hidden_dim=2, n_layers=1)
n_epochs = 100
lr=0.01
# Define Loss, Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

# Training Run
for epoch in range(1, n_epochs + 1):
    optimizer.zero_grad() # Clears existing gradients from previous epoch
    output, hidden = model(input_seq)
    loss = criterion(output, target_seq.view(-1).long())
    loss.backward() # Does backpropagation and calculates gradients
    optimizer.step() # Updates the weights accordingly
    if epoch%10 == 0:
        print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
        print("Loss: {:.4f}".format(loss.item()))

When I run the training loop, it fails with this error:

ValueError                                Traceback (most recent call last)
<ipython-input-9-ad1575e0914b> in <module>
      3     optimizer.zero_grad() # Clears existing gradients from previous epoch
      4     output, hidden = model(input_seq)
----> 5     loss = criterion(output, target_seq.view(-1).long())
      6     loss.backward() # Does backpropagation and calculates gradients
      7     optimizer.step() # Updates the weights accordingly
...

ValueError: Expected input batch_size (29204) to match target batch_size (58408).

I tried modifying input_size, output_size, hidden_dim and batch_size and reshaping the tensors, but the more I try the more confused I get. Could someone point out what I am doing wrong?

Furthermore, since batch size is defined as x.size(0) in Model.forward(self,x), this means I only have a single batch of size 596 right? What would be the correct way to have multiple smaller batches?

Solution

The output has size [batch_size * seq_len, 2] = [29204, 2], and you flatten the target_seq, which has size [batch_size * seq_len * 2] = [58408]. They don't have the same number of dimensions, while having the same number of total elements, therefore the first dimensions are not identical.

Regardless of the dimension mismatch, nn.CrossEntropyLoss is a categorical loss function, which means it would only predict a class from the output. You don't have any classes, but you are trying to predict coordinates, which are continuous values. For this you need to use a regression loss function, such as nn.MSELoss, which calculates the squared error/distance between the predicted and target coordinates.

criterion = nn.MSELoss()

# .flatten() does the same thing as .view(-1) but is more descriptive
loss = criterion(output.flatten(), target_seq.flatten())

The flattening can be avoided as the loss functions as well as the linear layer can operate on multidimensional inputs, which removes the potential risk of getting lost with the flattening and restoring of the dimensions, and the output is more comprehensible to inspect or use later outside of the training. For the linear layer, only the last dimension of the input needs to match the in_features of nn.Linear, which is hidden_dim in your case.

def forward(self, x):
    batch_size = x.size(0)      
    # Initializing hidden state for first input using method defined below
    hidden = self.init_hidden(batch_size)
    # Passing in the input and hidden state into the model and obtaining outputs
    # out size: [batch_size, seq_len, hidden_dim]
    out, hidden = self.rnn(x, hidden)
    # out size: [batch_size, seq_len, output_size]
    out = self.fc(out)        
    return out, hidden

Now the output of the model has the same size as the target_seq and you can directly call the loss function without flattening:

loss = criterion(output, target_seq)

hidden_dim of 2 (an [x,y] position) (or should this be 50 as in the length of a full sequence?)

The hidden_dim is not a pair of [x, y] and is completely unrelated to both the input_size and output_size. It defines the number of hidden features of the RNN, which is kind of its complexity, and bigger sizes potentially have more room to retain essential information, but also require more computations. There is no perfect hidden size and it largely depends on the use case. You can experiment with different sizes, e.g. 100, 256, etc. and see whether that improves your results.

Furthermore, since batch size is defined as x.size(0) in Model.forward(self,x), this means I only have a single batch of size 596 right? What would be the correct way to have multiple smaller batches?

Yes, you only have a single batch of size 596. If you want to use smaller batches, for example if you cannot fit all of them into a more complex model, you could easily use slices of them, but it would be better to use PyTorch's data utilities: torch.utils.data.TensorDataset to get a dataset, where each sequence of the input has a corresponding target, in combination with torch.utils.data.DataLoader to create batches for you.

from torch.utils.data import DataLoader, TensorDataset

# Match each sequence of the input_seq to the corresponding target_seq.
# e.g. dataset[0] == (input_seq[0], target_seq[0])
dataset = TensorDataset(input_seq, target_seq)

# Randomly shuffle the data and load it in batches of 16
data_loader = DataLoader(dataset, batch_size=16, shuffle=True)

# Process one batch at a time
for input, target in data_loader:
    output, hidden = model(input)
    loss = criterion(output, target)