image-processing deep-learning conv-neural-network max-pooling

Why do we use MaxPooling 2x2? Can we use any other size like 3x3 or 5x5? And how to select which pooling to choose in what scenrio?

Greating,

I've searched it everywhere on YouTube, Google and also read some articles and research papers but can't seem to find the exact answer to my questions

I've few questions regarding CONVOLUTIONAL NEURAL NETWORK, I'm confused with this question: why do we use MaxPooling size 2x2 why don't we use any other size like 3x3, 4x4 ... nxn(of course less than the size of input) and can we even use any other than 2x2? And my other question is that: why do we always use MaxPooling most of the times? Does it depend on the images? For example if we have some noisy images then would it be suitable to use MaxPooling or should we use any other type of pooling?

Thank you!

Solution

MaxPool2D downsamples its input along its spatial dimensions (height and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. For example, if I apply 2x2 MaxPooling2D on this array:

array = np.array([
[[5],[8]],
[[7],[2]]
])

Then the result would be 8, which is the maximum value of an element in this array.
Another example, if I apply a 2x2 MaxPooling2D on this array:

array = tf.constant([[[1.], [2.], [3.]],
                     [[4.], [5.], [6.]],
                     [[7.], [8.], [9.]]])

Then the output would be this:

([
[[5.], [6.]],
[[8.], [9.]]
])

What MaxPooling2D did here is that it slided a 2x2 window and took the maximum value of it, resulting in halving the dimension of the input array along both its height and width. If you still have any problem how this works, check this from keras and this from SO

Now that it is clear that MaxPool2D downsamples the input, let's get back to your question-

Why is a 2x2 MaxPooling used everywhere and not 3x3 or 4x4?

Well, the reason is that it reduces the data, applying a 3x3 MaxPooling2D on a matrix of shape (3,3,1) would result in a (1,1,1) matrix, and applying a 2x2 MaxPooling2D on a matrix of shape (3,3,1) would result in a (2,2,1) matrix. Obviously (2,2,1) matrix can keep more data than a matrix of shape (1,1,1). Often times, applying a MaxPooling2D operation with a pooling size of more than 2x2 results in a great loss of data, and so 2x2 is a better option to choose. This is why, you see 2x2 MaxPooling2D 'everywhere', like in ResNet50, VGG16 etc.