conv-neural-network batch-normalization dropout max-pooling

Does MaxPooling reduce overfitting?

I have trained the following CNN model with a smaller data set, therefore it does overfitting:

model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Conv2D(32, kernel_size=(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Flatten())
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer=Adam(), metrics=['accuracy'])

The model has a lot of trainable parameters (more than 3 million, that's why I wonder if I should reduce the number of parameters with additional MaxPooling like follows?

Conv - BN - Act - MaxPooling - Conv - BN - Act - MaxPooling - Dropout - Flatten

or with an additional MaxPooling and Dropout like follows?

Conv - BN - Act - MaxPooling - Dropout - Conv - BN - Act - MaxPooling - Dropout - Flatten

I am trying to understand the full sense of MaxPooling and whether it can help against overfitting.

Solution

Overfitting can happen when your dataset is not large enough to accomodate your number of features. Max pooling uses a max operation to pool sets of features, leaving you with a smaller number of them. Therefore, max-pooling should logically reduce overfit.

Drop-out reduces reliance on any single feature by ensuring that feature is not always available, forcing the model to look for different potential hints, rather than just sticking with one -- which would easily allow the model to overfit on any apparently good hint. Therefore, this also should help reduce overfit.