tensorflow time-series lstm tf.keras conv-neural-network

How to combine independent CNN and LSTM networks

I'm currently working with timeseries forecasts using tensorflow and keras. I built an CNN which performs quite good and a basic LSTM with shows also quite good results. Now I was thinking to combine the strengths of both networks. My first thought was just stack the LSTM on top of the CNN but regardless from the weak results I realized that I want both Networks to see the Input data so the CNN can learn about features while the LSTM should focus on the time related aspects. What would be a good start to try for building this kind of architecture? I was also wondering if it makes any sense to concatenate the outputs of both networks? I saw this often but I don't get why this would be useful. I always think about concatenating two different timeseries, which would not make sense at all. I already visited posts which seemed related to my question but it was not what I was looking for. independent

Solution

If you work with keras you should implement your model with the functional API or subclassing tf.keras.Model.
Concatenating the outputs of the both nets is good (Is like different people looking at the same object trying to figure out what is it -> the results would be more accurate)
If you will, you can try other merge features approaches:
- Weighted sum with learnable weights is a good and a simple option
- Using attention mechanism can give you good results too
Maybe another good option is to train both nets separately and then ensemble the results of the both worlds.

I attach a simple model example using two branches (CNN and LSTM)

import tensorflow as tf


class CNNLSTMTimeseries(tf.keras.Model):

    def __init__(self, n_classes):
        super(CNNLSTMTimeseries, self).__init__()

        self.conv1 = tf.keras.layers.Conv1D(64, 7, padding='same', 
                                            activation=None)
        self.bn1 = tf.keras.layers.BatchNormalization()

        self.conv2 = tf.keras.layers.Conv1D(64, 5, padding='same',
                                            activation=None)
        self.bn2 = tf.keras.layers.BatchNormalization()
        
        self.lstm = tf.keras.layers.LSTM(64, return_sequences=True)
        
        self.classifier = tf.keras.layers.Dense(n_classes, activation='softmax')

    def call(self, x):
        conv_x = tf.nn.relu(self.bn1(self.conv1(x)))
        conv_x = tf.nn.relu(self.bn2(self.conv2(conv_x)))

        lstm_x = self.lstm(x)

        x = tf.concat([conv_x, lstm_x], axis=-1)
        x = tf.reduce_mean(x, axis=1) # Average all timesteps

        return self.classifier(x)


TIMESTEPS = 16
FEATURES = 32
model = CNNLSTMTimeseries(3)
print(model(tf.random.uniform([1, TIMESTEPS, FEATURES])).shape)

The example is really simple and probabily won't work as a well studied architecture. You should modify the example and add Max pooling, dropouts, etc.