Tensor shape for multivariable LSTM on Pytorch

I have a dataset with 8 features and 4 timesteps. I am trying to implement an LSTM but need help understanding if i have set my tensor correctly. The aim is to take the outputted features from the LSTM and pass them through a NN.

My tensor shape is currently #samples x #timesteps x #features i.e. 4500x4x8. This works with the code below. I want to make sure that the model is indeed taking each timestep matrix as a new sequence (with matrix 4500x[0]x8 being the first timestep matrix and 4500x[3]x8 being the last timestep). I then take the final timestep output (output[:,-1,:] to feed through a NN.

Is the code doing what i think it is doing? I ask as performance is marginally less than a simple RF that only uses the final timestep data. This would be unexpected as the data has strong time-series correlations (it tracks patients vitals declining before going on ventilation).

I have the following code:

class LSTM1(nn.Module):
    def __init__(self, num_classes, input_size, hidden_size, num_layers):
        super(LSTM1, self).__init__()
        self.num_classes = num_classes #number of classes
        self.num_layers = num_layers #number of layers
        self.input_size = input_size #input size
        self.hidden_size = hidden_size #hidden state

        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                          num_layers=num_layers, batch_first=True) #lstm
        self.fc_1 =  nn.Linear(hidden_size, 32) #fully connected 1
        self.fc_2 =  nn.Linear(32, 12) #fully connected 1
        self.fc_3 = nn.Linear(12, 1)
        self.fc = nn.Sigmoid() #fully connected last layer

        self.relu = nn.ReLU()
    def forward(self,x):
            h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) #hidden state
            c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size)) #internal state
            # Propagate input through LSTM
            output, (hn, cn) = self.lstm(x, (h_0, c_0)) #lstm with input, hidden, and internal state
            out = output[:,-1,:] #reshaping the data for Dense layer next
            out = self.relu(out)
            out = self.fc_1(out) #first Dense
            out = self.relu(out) #relu
            out = self.fc_2(out) #2nd dense
            out = self.relu(out) #relu
            out = self.fc_3(out) #3rd dense
            out = self.relu(out) #relu
            out = self.fc(out) #Final Output
            return out

Solution

Error

Your error stems from the last three lines.

Do not use ReLU activation at the end of your network

Use nn.Linear -> nn.Sigmoid with BCELoss or nn.Linear with nn.BCEWithLogitsLoss (see here for what logits are).

What is going on

With ReLu you output values in the range [0, +inf)
Applying sigmoid on top of it “squashes” values to (0, 1) with threshold being 0 (e.g. 0 becomes 0.5 probability, hence 1 after threaholding at 0.5!)
In effect, you always predict 1 with this code, which is not what you want probably