I have a series of vectors representing a signal over time. I'd like to classify parts of the signal into two categories: 1 or 0. The reason for using LSTM is that I believe the network will need knowledge of the entire signal to classify.
My problem is developing the PyTorch model. Below is the class I've come up with.
class LSTMClassifier(nn.Module):
def __init__(self, input_dim, hidden_dim, label_size, batch_size):
self.lstm = nn.LSTM(input_dim, hidden_dim)
self.hidden2label = nn.Linear(hidden_dim, label_size)
self.hidden = self.init_hidden()
def init_hidden(self):
return (torch.zeros(1, self.batch_size, self.hidden_dim),
torch.zeros(1, self.batch_size, self.hidden_dim))
def forward(self, x):
lstm_out, self.hidden = self.lstm(x, self.hidden)
y = self.hidden2label(lstm_out[-1])
log_probs = F.log_softmax(y)
return log_probs
However this model is giving a bunch of shape errors, and I'm having trouble understanding everything going on. I looked at this SO question first.
You should follow PyTorch documentation, especially inputs
and outputs
part, always.
This is how the classifier should look like:
import torch
import torch.nn as nn
class LSTMClassifier(nn.Module):
def __init__(self, input_dim, hidden_dim, label_size):
super().__init__()
self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
self.hidden2label = nn.Linear(hidden_dim, label_size)
def forward(self, x):
_, (h_n, _) = self.lstm(x)
return self.hidden2label(h_n.reshape(x.shape[0], -1))
clf = LSTMClassifier(100, 200, 1)
inputs = torch.randn(64, 10, 100)
clf(inputs)
Points to consider:
super().__init__()
as it registers modules in your neural networks, allows for hooks etc.batch_first=True
so you can pass inputs of shape (batch, timesteps, n_features)
init_hidden
with zeros
, it is the default value if left uninitializedself.hidden
each time to LSTM. Moreover, you should not do that. It means that elements from each batch of data are somehow next steps, while batch elements should be disjoint and you probably do not need that._, (h_n, _)
returns last hidden cell from last timestep, exactly of shape: (num_layers * num_directions, batch, hidden_size)
. In our case num_layers
and num_directions
is 1
so we get (1, batch, hidden_size)
tensor as output(batch, hidden_size)
so it can be passed through linear layertorch.nn.BCEWithLogitsLoss
as loss for binary case and torch.nn.CrossEntropyLoss
for multiclass case. Also sigmoid
is proper activation for binary case, while softmax
or log_softmax
is appropriate for multiclass.0
(if returning unnormalized probabilities as in this case) is considered negative, anything above positive.