Search code examples
pythonmachine-learningkeraslstmrecurrent-neural-network

Training a multi-variate multi-series regression problem with stateful LSTMs in Keras


I have time series of P processes, each of varying length but all having 5 variables (dimensions). I am trying to predict the estimated lifetime of a test process. I am approaching this problem with a stateful LSTM in Keras. But I am not sure if my training process is correct.

I divide each sequence into batches of length 30. So each sequence is of the shape (s_i, 30, 5), where s_i is different for each of the P sequences (s_i = len(P_i)//30). I append all sequences into my training data which has the shape (N, 30, 5) where N = s_1 + s_2 + ... + s_p.

Model:

# design network
model = Sequential()
model.add(LSTM(32, batch_input_shape=(1, train_X[0].shape[1], train_X[0].shape[2]), stateful=True, return_sequences=True))
model.add(LSTM(16, return_sequences=False))
model.add(Dense(1, activation="linear"))
model.compile(loss='mse', optimizer=Adam(lr=0.0005), metrics=['mse'])

The model.summary() looks like

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (1, 30, 32)               4864      
_________________________________________________________________
lstm_2 (LSTM)                (1, 16)                   3136      
_________________________________________________________________
dense_1 (Dense)              (1, 1)                    17        
=================================================================

Training loops:

for epoch in range(epochs):
    mean_tr_acc = []
    mean_tr_loss = []
        
    for seq in range(train_X.shape[0]): #24
            
        # train on whole sequence batch by batch
        for batch in range(train_X[seq].shape[0]): #68
            b_loss, b_acc = model.train_on_batch(np.expand_dims(train_X[seq][batch], axis=0), train_Y[seq][batch][-1])    
                
            mean_tr_acc.append(b_acc)
            mean_tr_loss.append(b_loss)
                
        #reset lstm internal states after training of each complete sequence
        model.reset_states()

Edit:

The problem with the loss graph was I was dividing the values in my custom loss, making them too small. If I remove the division and plot the loss graph logarithmically, it looks alright.

New Problem:

Once the training is done, I am trying to predict. I show my model a 30 time-samples of a new process; so the input shape is same as the batch_input_shape during training i.e. (1, 30, 5). The prediction I am getting for different batches of the same sequence are all same.

I think I am doing something wrong in the training process.

Edit 2:

So the model predicts exactly same results only if it has been trained for more than 20 epochs. Otherwise the prediction values are very close but still a bit different. I guess this is due to some kind of over-fitting.

The loss for 25 epochs looks like this: loss_25epochs


Solution

  • Usually when results are the same it's because your data isn't normalized. I suggest you center your data with mean=0 and std=1 with a simple normal transform (ie. (data - mean)/std ). Try transforming it like so before training and testing. Differences in how data is normalized between training and testing sets can also cause problems, which may be the cause of your discrepancy in train vs test loss. Always use the same normalization technique for all your data.