I have time series of P
processes, each of varying length but all having 5 variables (dimensions). I am trying to predict the estimated lifetime of a test process. I am approaching this problem with a stateful LSTM
in Keras. But I am not sure if my training process is correct.
I divide each sequence into batches of length 30
. So each sequence is of the shape (s_i, 30, 5)
, where s_i
is different for each of the P
sequences (s_i = len(P_i)//30
). I append all sequences into my training data which has the shape (N, 30, 5)
where N = s_1 + s_2 + ... + s_p
.
# design network
model = Sequential()
model.add(LSTM(32, batch_input_shape=(1, train_X[0].shape[1], train_X[0].shape[2]), stateful=True, return_sequences=True))
model.add(LSTM(16, return_sequences=False))
model.add(Dense(1, activation="linear"))
model.compile(loss='mse', optimizer=Adam(lr=0.0005), metrics=['mse'])
The model.summary()
looks like
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (1, 30, 32) 4864
_________________________________________________________________
lstm_2 (LSTM) (1, 16) 3136
_________________________________________________________________
dense_1 (Dense) (1, 1) 17
=================================================================
for epoch in range(epochs):
mean_tr_acc = []
mean_tr_loss = []
for seq in range(train_X.shape[0]): #24
# train on whole sequence batch by batch
for batch in range(train_X[seq].shape[0]): #68
b_loss, b_acc = model.train_on_batch(np.expand_dims(train_X[seq][batch], axis=0), train_Y[seq][batch][-1])
mean_tr_acc.append(b_acc)
mean_tr_loss.append(b_loss)
#reset lstm internal states after training of each complete sequence
model.reset_states()
The problem with the loss graph was I was dividing the values in my custom loss, making them too small. If I remove the division and plot the loss graph logarithmically, it looks alright.
Once the training is done, I am trying to predict. I show my model a 30 time-samples of a new process; so the input shape is same as the batch_input_shape
during training i.e. (1, 30, 5)
. The prediction I am getting for different batches of the same sequence are all same.
I think I am doing something wrong in the training process.
So the model predicts exactly same results only if it has been trained for more than 20 epochs. Otherwise the prediction values are very close but still a bit different. I guess this is due to some kind of over-fitting.
Usually when results are the same it's because your data isn't normalized. I suggest you center your data with mean=0 and std=1 with a simple normal transform (ie. (data - mean)/std ). Try transforming it like so before training and testing. Differences in how data is normalized between training and testing sets can also cause problems, which may be the cause of your discrepancy in train vs test loss. Always use the same normalization technique for all your data.