I am trying to solve a seq-to-seq problem with a LSTM in Pytorch. Concretely, I am taking sequences of 5 elements, to predict the next 5 ones. My concern has to do with the data transformations. I have tensors of size [bs, seq_length, features], where seq_length = 10
, and features = 1
. Each feature is an int number between 0 and 3 (both included).
I believed input data had to be transformed to float range [0, 1] with a MinMaxScaler, in order to make the LSTM learning process easier. After that, I apply a Linear layer, which transforms the hidden states into the corresponding output, with size features
. My definition of the LSTM network in Pytorch:
class LSTM(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim, num_layers, dropout_prob):
super(LSTM, self).__init__()
self.lstm_layer = nn.LSTM(input_dim, hidden_dim, num_layers, dropout=dropout_prob)
self.output_layer = nn.Linear(hidden_dim, output_dim)
...
def forward(self, X):
out, (hidden, cell) = self.lstm_layer(X)
out = self.output_layer(out)
return out
The code I use to do the training loop is the following:
def train_loop(t, checkpoint_epoch, dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, X in enumerate(dataloader):
X = X[0].type(torch.float).to(device)
# X = torch.Size([batch_size, 10, input_dim])
# Split sequences into input and target
inputs = transform(X[:, :5, :]) # inputs = [batch_size, 5, input_dim]
targets = transform(X[:, 5:, :]) # targets = [batch_size, 5, input_dim]
# predictions (forward pass)
with autocast():
pred = model(inputs) # pred = [batch_size, 5, input_dim]
loss = loss_fn(pred, targets)
# backprop
optimizer.zero_grad()
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
#print(f"Current loss: {loss:>7f}, [{current:>5d}/{size:>5d}]")
# Delete variables and empty cache
del X, inputs, targets, pred
torch.cuda.empty_cache()
return loss
The code I used for preprocessing the data:
def main():
num_agents = 2
# Open the HDF5 file
with h5py.File('dataset_' + str(num_agents) + 'UAV.hdf5', 'r') as f:
# Access the dataset
data = f['data'][:]
# Convert to PyTorch tensor
data_tensor = torch.tensor(data)
size = data_tensor.size()
seq_length = 10
reshaped = data_tensor.view(-1, size[2], size[3])
r_size = reshaped.size()
reshaped = reshaped[:, :, 1:]
reshaped_v2 = reshaped.view(r_size[0], -1)
dataset = create_dataset(reshaped_v2.numpy(), seq_length)
f.close()
dataset = TensorDataset(dataset)
# Split the dataset into training and validation sets
train_size = int(0.8 * len(dataset)) # 80% for training
val_size = len(dataset) - train_size # 20% for validation
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])
train_dataloader = DataLoader(train_dataset, batch_size=params['batch_size'], shuffle=True, pin_memory=True)
val_dataloader = DataLoader(val_dataset, batch_size=params['batch_size'], shuffle=False, pin_memory=True)
Trying this, the model was not learning properly, so I was thinking that maybe directly calculating the loss between targets
(float values in range [0, 1]) and pred
(what I believe are float values in range [-1, 1] because of tanh activation functions from LSTM layer), with different scales might be wrong. Then, I tried aplying a sigmoid activation function right after the Linear layer in the forward pass, but wasn't learning properly neither. I tried executions with many hyperparameter combinations, but none resulted in a "normal" training curve. I also attach a screenshot for 5000 epochs to illustrate the training process:
My questions are:
The big problem with your code is how you define your LSTM layer.
nn.LSTM
by default expects an input of shape (sl, bs, features)
, while your input is of shape (bs, sl, features)
. As a result, your current code is processing along the wrong dimension. You need to pass batch_first=True
to nn.LSTM
to use a batch first input (lstm docs).
Additionally, your data setup is flawed. Your LSTM processes sequences one item at a time, meaning a sequence element i
has only seen sequence elements 0, 1, ... i-1
. But you expect this sequence element to predict i+5
in the output sequence.
A More concrete example, the 2nd item in your input is expected to predict the 2nd item in the output (7th item overall), without seeing any of the intermediate sequence items. You're trying to predict elements of the output sequence with only a fraction of the input sequence.
The best approach would be to use next step prediction instead. This works as each step has seen all steps before it, and there's no information gap to the step being predicted. It's a simple change:
# old
inputs = transform(X[:, :5, :])
targets = transform(X[:, 5:, :])
# new
inputs = transform(X[:, :-1, :]) # all but the last step
targets = transform(X[:, 1:, :]) # all but the first step
If you really want to stick to using the first 5 steps to predict the next 5, you need a sequence to sequence model. This involves adding a decoder LSTM that uses the hidden state produced from the encoder LSTM (thus having information from all time steps). Seq2seq also requires a for loop for the decoder which is super annoying.