Search code examples
pytorchlstmsequential

Pytorch LSTM Input


I currently have a dataset with multiple features, where each row is a time-series and each column is a time step. For example:

enter image description here

How should I re-shape the data so that I can properly represent the sequential information when I use a pytorch LSTM?

Currently I’ve left it the way it is, transformed the features into tensors and wrapped it inside a variable and reshaped it using this code:

X_train_tensors = Variable(torch.Tensor(X_train), requires_grad=True)
X_test_tensors = Variable(torch.Tensor(X_test), requires_grad=True)

y_train_tensors = Variable(torch.Tensor(y_train), requires_grad=True)
y_test_tensors = Variable(torch.Tensor(y_test))

Final Shape Looks like:

torch.Size([num_rows, 1, num_features])

The LSTM runs fine, however, I’m worried that I’ve not captured the sequential nature of the dataset by keeping it at this orientation? Should I have made every row a time-sequence and the columns a time-series? And in that case what would the final shape look like and how could I transform that using pytorch tools?


Solution

  • There's no point using a LSTM with your current configuration. LSTMs are useful for processing variable length sequences. If the number of features is set and your tensors are all of size (num_rows, 1, num_features), you can squeeze that to (num_rows, num_features) and put them through a MLP.

    If you want to use a LSTM-type approach, you would do something like this:

    • create tensors of size (num_rows, num_features) where all the features are integer values (I'm inferring this from your spreadsheet example)
    • put those tensors through a nn.Embedding layer to get tensors of size (num_rows, num_features, d_features)
    • send the tensors of size (num_rows, num_features, d_features) through the LSTM

    That said, if the number of features for your input is fixed, there's no need to use a LSTM. LSTMs are used when you have to process variable length sequences.

    As an aside, it looks like you're using the Variable syntax which was depreciated several years ago - you should check out the current documentation for pytorch.