Search code examples
machine-learningpytorchclassificationlstm

How can I use an LSTM to classify a series of vectors into two categories in Pytorch


I have a series of vectors representing a signal over time. I'd like to classify parts of the signal into two categories: 1 or 0. The reason for using LSTM is that I believe the network will need knowledge of the entire signal to classify.

My problem is developing the PyTorch model. Below is the class I've come up with.

class LSTMClassifier(nn.Module):

   def __init__(self, input_dim, hidden_dim, label_size, batch_size):
       self.lstm = nn.LSTM(input_dim, hidden_dim)
       self.hidden2label = nn.Linear(hidden_dim, label_size)
       self.hidden = self.init_hidden()

   def init_hidden(self):
       return (torch.zeros(1, self.batch_size, self.hidden_dim), 
               torch.zeros(1, self.batch_size, self.hidden_dim))

   def forward(self, x):
       lstm_out, self.hidden = self.lstm(x, self.hidden)
       y  = self.hidden2label(lstm_out[-1])
       log_probs = F.log_softmax(y)
       return log_probs

However this model is giving a bunch of shape errors, and I'm having trouble understanding everything going on. I looked at this SO question first.


Solution

  • You should follow PyTorch documentation, especially inputs and outputs part, always.

    This is how the classifier should look like:

    import torch
    import torch.nn as nn
    
    
    class LSTMClassifier(nn.Module):
        def __init__(self, input_dim, hidden_dim, label_size):
            super().__init__()
            self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
            self.hidden2label = nn.Linear(hidden_dim, label_size)
    
        def forward(self, x):
            _, (h_n, _) = self.lstm(x)
            return self.hidden2label(h_n.reshape(x.shape[0], -1))
    
    
    clf = LSTMClassifier(100, 200, 1)
    inputs = torch.randn(64, 10, 100)
    clf(inputs)
    

    Points to consider:

    • always use super().__init__() as it registers modules in your neural networks, allows for hooks etc.
    • Use batch_first=True so you can pass inputs of shape (batch, timesteps, n_features)
    • No need to init_hidden with zeros, it is the default value if left uninitialized
    • No need to pass self.hidden each time to LSTM. Moreover, you should not do that. It means that elements from each batch of data are somehow next steps, while batch elements should be disjoint and you probably do not need that.
    • _, (h_n, _) returns last hidden cell from last timestep, exactly of shape: (num_layers * num_directions, batch, hidden_size). In our case num_layers and num_directions is 1 so we get (1, batch, hidden_size) tensor as output
    • Reshape to (batch, hidden_size) so it can be passed through linear layer
    • Return logits without activation. Only one if it is a binary case. Use torch.nn.BCEWithLogitsLoss as loss for binary case and torch.nn.CrossEntropyLoss for multiclass case. Also sigmoid is proper activation for binary case, while softmax or log_softmax is appropriate for multiclass.
    • For binary only one output is needed. Any value below 0 (if returning unnormalized probabilities as in this case) is considered negative, anything above positive.