I have a dataset with 8 features and 4 timesteps. I am trying to implement an LSTM but need help understanding if i have set my tensor correctly. The aim is to take the outputted features from the LSTM and pass them through a NN.
My tensor shape is currently #samples x #timesteps x #features i.e. 4500x4x8. This works with the code below. I want to make sure that the model is indeed taking each timestep matrix as a new sequence (with matrix 4500x[0]x8 being the first timestep matrix and 4500x[3]x8 being the last timestep). I then take the final timestep output (output[:,-1,:] to feed through a NN.
Is the code doing what i think it is doing? I ask as performance is marginally less than a simple RF that only uses the final timestep data. This would be unexpected as the data has strong time-series correlations (it tracks patients vitals declining before going on ventilation).
I have the following code:
class LSTM1(nn.Module):
def __init__(self, num_classes, input_size, hidden_size, num_layers):
super(LSTM1, self).__init__()
self.num_classes = num_classes #number of classes
self.num_layers = num_layers #number of layers
self.input_size = input_size #input size
self.hidden_size = hidden_size #hidden state
self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
num_layers=num_layers, batch_first=True) #lstm
self.fc_1 = nn.Linear(hidden_size, 32) #fully connected 1
self.fc_2 = nn.Linear(32, 12) #fully connected 1
self.fc_3 = nn.Linear(12, 1)
self.fc = nn.Sigmoid() #fully connected last layer
self.relu = nn.ReLU()
def forward(self,x):
h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) #hidden state
c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) #internal state
# Propagate input through LSTM
output, (hn, cn) = self.lstm(x, (h_0, c_0)) #lstm with input, hidden, and internal state
out = output[:,-1,:] #reshaping the data for Dense layer next
out = self.relu(out)
out = self.fc_1(out) #first Dense
out = self.relu(out) #relu
out = self.fc_2(out) #2nd dense
out = self.relu(out) #relu
out = self.fc_3(out) #3rd dense
out = self.relu(out) #relu
out = self.fc(out) #Final Output
return out
Your error stems from the last three lines.
Do not use ReLU activation at the end of your network
Use nn.Linear -> nn.Sigmoid with BCELoss or nn.Linear with nn.BCEWithLogitsLoss (see here for what logits are).