LSTM overfitting but validation accuracy not improving

The task I am trying to do is to classify EEG signals into 4 possible classes. The data is divided up into trials. Subjects were asked to think about doing 1 of four actions, and the classification task is to predict what they were thinking based on the EEG signals.

I have ~2500 trials. For each trial, there are 22 channels of EEG sensor inputs and 1000 time steps. My baseline is a single layer MLP, and I get ~45% validation accuracy.

Since keras LSTM requires one-hot-encoded vectors for y, I mapped 0,1,2,3 to their corresponding one-hot encodings before doing training (y_total_new). At first, I manually created an 80/20 train/test split but then just opted to let keras do the split (validation_split=0.2).

This is my first LSTM experiment ever. I chose 100 units to begin with. I added a fully connected layer with four neurons in order to map to output classes, and used categorical_crossentropy for my loss function. So far with the LSTM, I can't get above 25% validation accuracy. If I run the following code for 50 epochs instead of 3, the LSTM overfits the data but the validation accuracy stays around 0.25.

Since this is my first time using an LSTM, I'm wondering if someone could shed insight into design cues I might have missed or point me in the right direction.

from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM


time_steps = 1000
n_features = 22

model = Sequential()
model.add(LSTM(1000, return_sequences=False, input_shape=(time_steps, n_features)))
model.add(Dropout(0.2))
model.add(Dense(22, activation='tanh'))
model.add(Dense(4, activation='sigmoid'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit(X, y_total_new, validation_split=0.2, batch_size=16, epochs=50)
#score = model.evaluate(X_test, y_test_new, batch_size=16)

Solution

Have you tried to add convolutional layers as the first layers of your model? I am using right now this approach in order to classify EMG signals into 53 classes. The convolutional layers are supposed to automatically learn the features from the data, and them feed the LSTM layers with them. There are several possible architectures, DeepConvLstm is one of them:

DeepConvLstmArch

DeepConvLstm paper: Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition