Search code examples
keraslstmconv-neural-network

Should CNN layers come before Bi-LSTM or after?


I'm trying to build a Univariate Time Series forecasting model. The current architecture is looking like this :

model = Sequential()
model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(n_steps_in, n_features)))
model.add(Bidirectional(LSTM(20, return_sequences=True)))
model.add(Conv1D(64, 3, activation='relu', input_shape=(n_steps_in, n_features)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(n_steps_out))

Then I tried the following, which places all CNN layers before Bi-LSTM layers (but doesn't work):

model = Sequential()
model.add(Conv1D(64, 3, activation='relu', input_shape=(n_steps_in, n_features)))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Bidirectional(LSTM(20, input_shape=(n_steps_in, n_features), return_sequences=True)))
model.add(Bidirectional(LSTM(20, return_sequences=True)))
model.add(Dense(100, activation='relu'))
model.add(Dense(n_steps_out))

The latest implementation doesn't seem to work. Any suggestions of fixing this ? Another question I had was, is there a one method approach to decide if CNN should come before Bi-LSTM of vice-versa ?


Solution

  • your network receives as input sequences and output sequences so u need to take care of dimensionality. to do this you have to play with padding in convolutional layers and with pooling operation. you also need to set return_sequences=True in your last LSTM cell (you are predicting a sequence). In the example below I use your network with padding and I delate flattening which destroys the 3D dimensionality

    you can apply Convolution before or after LSTM. the best way to do this is to try both and evaluate the performance with a trustable validation set

    CNN + LSTM

    n_sample = 100
    n_steps_in, n_steps_out, n_features = 30, 15, 1
    X = np.random.uniform(0,1, (n_sample, n_steps_in, n_features))
    y = np.random.uniform(0,1, (n_sample, n_steps_out, n_features))
    
    model = Sequential()
    model.add(Conv1D(64, 3, activation='relu', padding='same',
                     input_shape=(n_steps_in, n_features)))
    model.add(Conv1D(64, 3, padding='same', activation='relu'))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Bidirectional(LSTM(20, return_sequences=True)))
    model.add(Bidirectional(LSTM(20, return_sequences=True)))
    model.add(Dense(1))
    model.compile('adam', 'mse')
    model.summary()
    
    model.fit(X,y, epochs=3)
    

    LSTM + CNN

    model = Sequential()
    model.add(Bidirectional(LSTM(20, return_sequences=True), 
                     input_shape=(n_steps_in, n_features)))
    model.add(Bidirectional(LSTM(20, return_sequences=True)))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Conv1D(64, 3, activation='relu', padding='same'))
    model.add(Conv1D(64, 3, padding='same', activation='relu'))
    model.add(Dense(1))
    model.compile('adam', 'mse')
    model.summary()
    
    model.fit(X,y, epochs=3)