Search code examples
tensorflowtime-serieslstmtf.kerasconv-neural-network

How to combine independent CNN and LSTM networks


I'm currently working with timeseries forecasts using tensorflow and keras. I built an CNN which performs quite good and a basic LSTM with shows also quite good results. Now I was thinking to combine the strengths of both networks. My first thought was just stack the LSTM on top of the CNN but regardless from the weak results I realized that I want both Networks to see the Input data so the CNN can learn about features while the LSTM should focus on the time related aspects. What would be a good start to try for building this kind of architecture? I was also wondering if it makes any sense to concatenate the outputs of both networks? I saw this often but I don't get why this would be useful. I always think about concatenating two different timeseries, which would not make sense at all. I already visited posts which seemed related to my question but it was not what I was looking for. independent


Solution

    • If you work with keras you should implement your model with the functional API or subclassing tf.keras.Model.
    • Concatenating the outputs of the both nets is good (Is like different people looking at the same object trying to figure out what is it -> the results would be more accurate)
    • If you will, you can try other merge features approaches:
      • Weighted sum with learnable weights is a good and a simple option
      • Using attention mechanism can give you good results too
    • Maybe another good option is to train both nets separately and then ensemble the results of the both worlds.

    I attach a simple model example using two branches (CNN and LSTM)

    import tensorflow as tf
    
    
    class CNNLSTMTimeseries(tf.keras.Model):
    
        def __init__(self, n_classes):
            super(CNNLSTMTimeseries, self).__init__()
    
            self.conv1 = tf.keras.layers.Conv1D(64, 7, padding='same', 
                                                activation=None)
            self.bn1 = tf.keras.layers.BatchNormalization()
    
            self.conv2 = tf.keras.layers.Conv1D(64, 5, padding='same',
                                                activation=None)
            self.bn2 = tf.keras.layers.BatchNormalization()
            
            self.lstm = tf.keras.layers.LSTM(64, return_sequences=True)
            
            self.classifier = tf.keras.layers.Dense(n_classes, activation='softmax')
    
        def call(self, x):
            conv_x = tf.nn.relu(self.bn1(self.conv1(x)))
            conv_x = tf.nn.relu(self.bn2(self.conv2(conv_x)))
    
            lstm_x = self.lstm(x)
    
            x = tf.concat([conv_x, lstm_x], axis=-1)
            x = tf.reduce_mean(x, axis=1) # Average all timesteps
    
            return self.classifier(x)
    
    
    TIMESTEPS = 16
    FEATURES = 32
    model = CNNLSTMTimeseries(3)
    print(model(tf.random.uniform([1, TIMESTEPS, FEATURES])).shape)
    

    The example is really simple and probabily won't work as a well studied architecture. You should modify the example and add Max pooling, dropouts, etc.