I'm working on a project that involves signal classification. I'm trying different models of ANN using keras to see which one is better, for now focusing in simple networks but I'm struggling with the LSTM one, following this example: https://machinelearningmastery.com/how-to-develop-rnn-models-for-human-activity-recognition-time-series-classification/.
My inputs are 1D signals that I get from an electronic sensor that will be divided in 3 different categories. See here one signal of two different categories so you see they are quite different over time. https://www.dropbox.com/s/9ctdegtuyjamp48/example_signals.png?dl=0
To start with a simple model, we are trying the following simple model. Since signals are of different length, a masking process has been performed on them, enlarge each one to the longest one with the masked value of -1000 (impossible value to happen in our signal). Data is correctly reshaped from 2D to 3D (since is needed in 3D for the LSTM layer) using the following command since I only have a feature:
Inputs = Inputs.reshape((Inputs.shape[0],Inputs.shape[1],1))
Then, data is divided in training and validation one and feed into the following model:
model = Sequential()
model.add(Masking(mask_value=-1000, input_shape=(num_steps,1)))
model.add(LSTM(20, return_sequences=False))
model.add(Dense(15, activation='sigmoid'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
However, for some reason, each time the network is trained, it always predicts that ALL signals are from the same category, possibly a different category each time is trained, usually the one with more cases inside the input data. If I force the network to be trained with same amount of data for each category, keeps giving me the same result.
I don't think this behaviour is normal: having bad accuracy could happen but this must have to do with some elementary error in the model that I'm not spotting since the given data in the training is correctly inputted, no errors there, rechecked multiple times. Anyone has any idea why this is happening? Let me know if any more info could be useful to add to this post.
Just for any curious reader: at the end I could resolve it by normalizing the data.
def LSTM_input_normalize(inputs):
new_inputs = []
for in_ in inputs:
if -1000 in in_:
start_idx = np.where(in_ == -1000)[0][0] # index of the first "-1000" in the sequence
else:
start_idx = in_.shape[0]
# compute mean and std of the current sequence
curr_mean = np.mean(in_[:start_idx])
curr_std = np.std(in_[:start_idx])
# normalize the single sequence
in_[:start_idx] = (in_[:start_idx] - curr_mean) / curr_std
new_inputs.append(in_)
return np.array(new_inputs)