Search code examples

Flatten operation between Dense layers

In a Keras implementation, I once saw the two last fully connected layers defined as follows

outX = Dense(300, activation='relu')(outX)
outX = Flatten()(outX)
predictions = Dense(1,activation='linear')(outX)

Between the two Dense layers, there is Flatten layer, why we must add a Flatten operation between two fully connected layer. Is that always required?


  • Short answer: a Flatten layer doesn't have any parameter to learn itself. However, adding a Flatten layer to the model can increase the learning parameters of the model.

    Example: try to figure out the difference between these two models:

    1) Without Flatten:

    inp = Input(shape=(20,10,))
    A = Dense(300, activation='relu')(inp)
    #A = Flatten()(A) 
    A = Dense(1, activation='relu')(A)
    m = Model(inputs=inp,outputs=A)


    input_9 (InputLayer)         (None, 20, 10)            0         
    dense_20 (Dense)             (None, 20, 300)           3300      
    dense_21 (Dense)             (None, 20, 1)             301       
    Total params: 3,601
    Trainable params: 3,601
    Non-trainable params: 0

    2) With Flatten:

    inp = Input(shape=(20,10,))
    A = Dense(300, activation='relu')(inp)
    A = Flatten()(A) 
    A = Dense(1, activation='relu')(A)
    m = Model(inputs=inp,outputs=A)


    input_10 (InputLayer)        (None, 20, 10)            0 
    dense_22 (Dense)             (None, 20, 300)           3300      
    flatten_9 (Flatten)          (None, 6000)              0         
    dense_23 (Dense)             (None, 1)                 6001      
    Total params: 9,301
    Trainable params: 9,301
    Non-trainable params: 0

    Finally, To add or not to add a Flatten layer depends on the data at hand. Having more parameter to learn can lead to a more accurate model OR can cause overfitting. So, one answer should be: "apply both, choose best"