Search code examples
pythontensorflowkerasdeep-learninglstm

what is the difference between Sequential and Model([input],[output]) in TensorFlow?


It seems Sequential and Model([input],[output]) have the same results when I just build a model layer by layer. However, when I use the following two models with the same input, they give me different results.By the way,the input shape is (None, 15, 2) ande the output shape is (None, 1, 2).
Sequential model:

model = tf.keras.Sequential(
    [
        tf.keras.layers.Conv1D(filters = 4, kernel_size =7, activation = "relu"),
        tf.keras.layers.Conv1D(filters = 6, kernel_size = 11, activation = "relu"),
        tf.keras.layers.LSTM(100, return_sequences=True,activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.LSTM(100,activation='relu'),
        tf.keras.layers.Dense(2,activation='relu'),
        tf.keras.layers.Reshape((1,2))
    ]
)

Model([input],[output]) model

input_layer = tf.keras.layers.Input(shape=(LOOK_BACK, 2)) 
conv = tf.keras.layers.Conv1D(filters=4, kernel_size=7, activation='relu')(input_layer)
conv = tf.keras.layers.Conv1D(filters=6, kernel_size=11, activation='relu')(conv)
lstm = tf.keras.layers.LSTM(100, return_sequences=True, activation='relu')(conv)
dropout = tf.keras.layers.Dropout(0.2)(lstm)
lstm = tf.keras.layers.LSTM(100, activation='relu')(dropout)
dense = tf.keras.layers.Dense(2, activation='relu')(lstm)
output_layer = tf.keras.layers.Reshape((1,2))(dense)
model = tf.keras.models.Model([input_layer], [output_layer])

the result of Sequential model:

enter image description here

mse: 21.679258038588586
rmse: 4.65609901511862
mae: 3.963341420395535

And the result of Model([input],[output]) model:
enter image description here

mse: 36.85855652774293
rmse: 6.071124815694612
mae: 4.4878270279889065


Solution

  • The Sequence version uses the Sequencial model while the Model([inputs], [outputs]) uses the Functional API.

    The first is easier to use, but only works for single-input single-output feed forward models (in the sense of Keras layers).

    The second is more complex but get rid of those constraints, allowing to create many more models.

    So, your main point is right: any sequencial model can be re-written as a functional model. You can double check this by comparing the architectures with the usage of summary function and plotting the models.

    However, this only shows that architectures are the same, but not the weights!

    Assuming you are fitting both models with same data and same compile and fit params (by the way, include those in your question), there is lots of randomness in the training process which may lead to different results. So, try the following to compare them better:

    • remove as much randomness as possible by setting seeds, in your code and for each layer instantiation.
    • avoid using data augmentation if using it.
    • use the same validation/train split for both models: to be sure, you can split the dataset yourself.
    • do not use shuffling in data generators nor during the training.

    Here you can read more about producing reproducible results in keras.

    Even after following those tips, your results may not be deterministic, and hence not the same, so finally, and maybe more important: do not compare single run: train and eval each model several times (for instance, 20) and then compare the average MAE with it's standard deviation.

    If after all this your results are still so different, please, update your question with them.