Search code examples
tensorflowdeep-learningneural-networkconv-neural-networkpooling

How does average pooling function work in TensorFlow?


Let us assume a tensor like this:

x = tf.constant([[1., 2., 3.],
                  [4., 5., 6.],
                  [7., 8., 9.]])

To apply the average pooling function, I will do this:

x = tf.reshape(x, [1, 3, 3, 1])
avg_pool_2d = tf.keras.layers.AveragePooling2D(pool_size=(2, 2),strides=(2, 2), padding='same')
avg_pool_2d(x)

The result is:

<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[3. ],
         [4.5]],
        [[7.5],
         [9. ]]]], dtype=float32)>

I can follow the logic above:

(1+2+4+5)/4 = 3
(3+6)/2 = 4.5
(7+8)/2 = 7.5
(9/1) = 9

I think the logic is as follows: The pooling filter is usually situated inside the tensor to perform the pooling operator. But when the entire filter does not situate inside the tensor (see the below figure for an example), we need to specify the number of elements of the filter that are situated inside the tensor (a). The following figure illustrates the logic for a 4 by 3 tensor, with pooling filter and stride sizes of 2 by 2, and padding the same.

enter image description here

However, it is not always like this. For example, suppose the following tensor:

y = tf.constant([[1., 2., 3., 4., 5.],
                 [6., 7., 8., 9., 10.]])

Then, I do this:

y = tf.reshape(y, [1, 2, 5, 1])
avg_pool_2d = tf.keras.layers.AveragePooling2D(pool_size=(4, 4),strides=(4, 4), padding='same')
avg_pool_2d(y)

The result is like this:

    <tf.Tensor: shape=(1, 1, 2, 1), dtype=float32, numpy=
array([[[[4.5 ],
         [7.]]]], dtype=float32)>

If I wanted to follow the logic for the first example, I expected the result to be like this:

(1+2+3+4+6+7+8+9)/8 = 5
(5+10)/2 = 7.5

I am using TensorFlow 2.8.0. What mistake am I making?


Solution

  • When the filter size is 2 by 2, it always pads on the right-hand side of the tensor. But when it is 4 by 4, it is possible to pad on both right/left and up/down. Please look at the following examples:

    avg = tf.keras.layers.AveragePooling2D(pool_size=(4, 4),strides=(4, 4), padding='same')
    

    Scenario 1:

        y = tf.constant([[1., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])
        y = tf.reshape(y, [1, 2, 5, 1])
        avg(y)
    

    Result:

        <tf.Tensor: shape=(1, 1, 2, 1), dtype=float32, numpy=
        array([[[[0.16666667],
                 [0.        ]]]], dtype=float32)>
    

    enter image description here

    Scenario 2:

        y = tf.constant([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 1.]])
        y = tf.reshape(y, [1, 2, 5, 1])
        avg2(y)
    

    Result:

        <tf.Tensor: shape=(1, 1, 2, 1), dtype=float32, numpy=
        array([[[[0.  ],
                 [0.25]]]], dtype=float32)>
    

    enter image description here

    Scenario 3:

        y = tf.constant([[0., 0., 0., 1., 0.], [0., 0., 0., 0., 1.]])
        y = tf.reshape(y, [1, 2, 5, 1])
        avg2(y)
    

    Result:

        <tf.Tensor: shape=(1, 1, 2, 1), dtype=float32, numpy=
        array([[[[0. ],
                 [0.5]]]], dtype=float32)>
    

    enter image description here