In a Keras implementation, I once saw the two last fully connected layers defined as follows
outX = Dense(300, activation='relu')(outX)
outX = Flatten()(outX)
predictions = Dense(1,activation='linear')(outX)
Between the two Dense layers, there is Flatten layer, why we must add a Flatten operation between two fully connected layer. Is that always required?
Short answer: a Flatten layer doesn't have any parameter to learn itself. However, adding a Flatten layer to the model can increase the learning parameters of the model.
Example: try to figure out the difference between these two models:
1) Without Flatten
:
inp = Input(shape=(20,10,))
A = Dense(300, activation='relu')(inp)
#A = Flatten()(A)
A = Dense(1, activation='relu')(A)
m = Model(inputs=inp,outputs=A)
m.summary()
Output:
input_9 (InputLayer) (None, 20, 10) 0
dense_20 (Dense) (None, 20, 300) 3300
dense_21 (Dense) (None, 20, 1) 301
Total params: 3,601
Trainable params: 3,601
Non-trainable params: 0
2) With Flatten
:
inp = Input(shape=(20,10,))
A = Dense(300, activation='relu')(inp)
A = Flatten()(A)
A = Dense(1, activation='relu')(A)
m = Model(inputs=inp,outputs=A)
m.summary()
Output:
input_10 (InputLayer) (None, 20, 10) 0
dense_22 (Dense) (None, 20, 300) 3300
flatten_9 (Flatten) (None, 6000) 0
dense_23 (Dense) (None, 1) 6001
Total params: 9,301
Trainable params: 9,301
Non-trainable params: 0
Finally, To add or not to add a Flatten layer depends on the data at hand. Having more parameter to learn can lead to a more accurate model OR can cause overfitting. So, one answer should be: "apply both, choose best"