python keras deep-learning lstm data-processing

How to adjust a data for MLP to LSTM (expected ndim=3, found ndim=2 Error)

I have the data that working on Multi-Layer Perceptron architecture looks like this

X_train_feature.shape
(52594, 16)

X_train_feature[0]
array([1.18867208e-03, 1.00000000e+00, 8.90000000e+01, 8.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00])

y_train
(52594, 2)

y_train[0].toarray()
array([[0., 1.]])

By the first dimension: number samples and the second: for the X_train is the number of features and in the y_train is one-hot encoder.

And I want to use the same data on LSTM/Bi-LSTM, so I copy the code from the internet and change a input value to same as MLP

def define_model():
    model = Sequential()
    model.add(LSTM(20, input_shape=X_train_feature[0].shape, return_sequences=True))
    model.add(TimeDistributed(Dense(1, activation='sigmoid')))
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])  # compile
    print('Total params: ', model.count_params())
    return model

But When I try to create a model the error about input shape will be append

model = define_model()
ValueError: Input 0 is incompatible with layer lstm_30: expected ndim=3, found ndim=2

What should I need to adjust my data to apply on LSTM or I need to change an architecture config? Thank you so much.

Solution

LSTM (unlike a perceptron) is not a feed-forwarding network. It needs a history to predict the next point. So, a proper input tensor to a LSTM should be of shape (timesteps, num_features) meaning that each sample is a sequence of timesteps observations such that the cell state is initiated in the first observation of the sequence and goes through the entire sequence.

Therefore, the input tensor should have the shape (num_sequences, seq_length, num_features) where:

num_sequences: number of samples, i.e. how many sequences do you have to train the model?
seq_length: How long these sequences are. for variable-length sequences, you can supply None.
num_features: How many features does have a single observation in a given sequence?