LSTM Class Imbalance

TLDR

Extract features using CNN, feed into a separate many to many LSTM model. Binary classification but ratio of class 1 to class 2 is 1:4 resulting in overfitting during training.

From what I understand, if I remove class 2 data such that ratio is now 1:1, shuffle the data with the labels so the model is not guessing class 1 for the first half of the samples, I will be ruining the temporal sequence of the data. The model will not be able to make good predictions based on the adjacent frames.

How can I solve this?

------------------------------------------------------------------------------------------------------------------------------------

The idea is to create a Flappy Bird bot that takes in a sequence of 5 images and predicts whether to jump or not. To achieve this, I have taken as input 20 000 images of gameplay at around 10 fps and extracted the features from the Flatten() layer of the following architecture.

Input: (samples, height, width, channels) in other words (20 000, 250, 150, 3)

Features: (20 000, 14336)

model = Sequential()

model.add(Conv2D(64, 3, 3, border_mode='same', input_shape=(250,150,3), activation='relu'))
model.add(Conv2D(64, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, 3, 3, border_mode='same', activation='relu'))
model.add(Conv2D(128, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(256, 3, 3, border_mode='same', activation='relu'))
model.add(Conv2D(256, 3, 3, border_mode='same', activation='relu'))
model.add(Conv2D(256, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(512, 3, 3, border_mode='same', activation='relu'))
model.add(Conv2D(512, 3, 3, border_mode='same', activation='relu'))
model.add(Conv2D(512, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(512, 3, 3, border_mode='same', activation='relu'))
model.add(Conv2D(512, 3, 3, border_mode='same', activation='relu'))
model.add(Conv2D(512, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.summary()

RMSprop = optimizers.RMSprop(lr=0.0001)
model.compile(loss='binary_crossentropy', optimizer=RMSprop, metrics=['accuracy'])

callbacks = [EarlyStopping(monitor='val_acc', min_delta=0.0001, mode='max', patience=10)]

I then reshaped the features to (4 000, 5, 14336) where 5 is my time_step. I also reshaped my labels to (4 000, 5, 2) where jump and not jump are one-hot encoded. I then took the features as input to the many to many LSTM.

model = Sequential()

model.add(LSTM(256, return_sequences=True, input_shape=(5, 14336), dropout=0.5))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(2, activation='softmax'))
model.summary()

RMSprop = optimizers.RMSprop(lr=0.0001)
model.compile(loss='binary_crossentropy', optimizer=RMSprop, metrics=['accuracy'])

callbacks = [EarlyStopping(monitor='val_acc', min_delta=0.0001, mode='max', patience=10)]

As expected, the model overfitted and I stopped the training:

Epoch 6/50
3232/3232 [==============================] - 138s 43ms/step - loss: 0.5020 - acc: 0.8058 - val_loss: 0.5541 - val_acc: 0.7735

As mentioned, I did consider balancing the classes and shuffling the data (both of which I have not done) as to my understanding this will mess up the sequence of images which would take away from the whole point of using LSTM.

I have tried using just a CNN trained on individual frames with a shuffle and balanced classes and that achieved a 0.9 val_acc and 0.95 acc which is pretty decent. However, on testing the bot, it failed to last more than 6 seconds; its highscore was just 3.

Any help is appreciated.

Solution

I have closed this question and opened a more detailed one