Search code examples
pythontensorflowkerasmax-pooling

Understanding average (sum) pooling padding in keras


I have a simple sum pooling implemented in keras tensorflow, using AveragePooling2D*N*N, so it creates a sum of the elements in pool with some shape, same padding so the shape won't change:

import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
import tensorflow as tf
from tensorflow.keras.backend import square

#generating the example matrix
def getMatrixByDefinitions(definitions, width, height):
    matrix = np.zeros((width, height))
    for definition in definitions:
        x_cor = definition[1]
        y_cor = definition[0]
        value = definition[2]
        matrix.itemset((x_cor, y_cor), value)
    return  matrix

generated = getMatrixByDefinitions(width=32, height=32, definitions =[[7,16,1]])

def avg_pool(pool):
    return tf.keras.layers.AveragePooling2D(pool_size=(pool,pool), strides=(1, 1), padding='same')

def summer(pool, tensor):
    return avg_pool(pool)(tensor)*pool*pool

def numpyToTensor(numpy_data):
    numpy_as_array = np.asarray(numpy_data)
    tensor_data = numpy_as_array.reshape(1, numpy_data.shape[1], numpy_data.shape[1], 1)
    return tensor_data

data = numpyToTensor(generated)
pooled_data = summer(11, data)

def printMatrixesToHeatMap(matrixes, title):
    # f = pyplot.figure()  # width and height in inches
    matrix_count = len(matrixes)
    width_ratios = [4] * matrix_count + [0.2]

    mergedMatrixes = matrixes[0][0]
    for matrix in matrixes:
        mergedMatrixes = np.concatenate((mergedMatrixes, matrix[0]), axis=0)

    vmin = np.min(mergedMatrixes)
    vmax = np.max(mergedMatrixes)

    fig, axs = plt.subplots(ncols=matrix_count + 1, gridspec_kw=dict(width_ratios=width_ratios))
    fig.set_figheight(20)
    fig.set_figwidth(20 * matrix_count + 5)
    axis_id = 0

    for matrix in matrixes:
        sns.heatmap(matrix[0], annot=True, cbar=False, ax=axs[axis_id], vmin=vmin, vmax=vmax)
        axs[axis_id].set_title(matrix[1])
        axis_id = axis_id + 1

    #fig.colorbar(axs[1].collections[0], cax=axs[matrix_count])
    fig.savefig(title+".pdf", bbox_inches='tight')

def tensorToNumpy(tensor):
    width = tensor.get_shape()[1]
    height = tensor.get_shape()[2]

    output = tf.reshape(tensor, [width, height])
    #output = output.eval(session=tf.compat.v1.Session())
    output = output.numpy()
    return np.array(output)

printMatrixesToHeatMap([[tensorToNumpy(pooled_data), "Pooled data"]],
                              "name")

After testing it on very simple 2D array I have found out it does not do what I expect (original and pooled data):

Original data

pooled data

You can see that the single one sum-pooled (according to average pooling) ended up with sum greater than real sum, which is 1, near the borders. (in this case max can be used, but the real data are more complex and we need sum) This would mean that average near borders is count not from padded data but the original. Or is this misunderstanding of padding from my side? I need to have ones on indices where 1.1, 1.2, 1.4 is. Why is this and how can I solve such problem?

Note that I do not want to manually set the correct sum, so I am looking for a way to achieve this in keras pooling itself.


Solution

  • It seems to be a problem with the "SAME" padding algorithm. Unfortunately,there is no way of specifying an explicit padding to the avg_pool2d op. It is possible to manually pad the input with tf.pad though. Here is a really naive approach to padding that will work with odd shaped pooling filters and strides size of 1 :

    generated = getMatrixByDefinitions(width=32, height=32, definitions =[[7,16,1]])
    gen_nhwc = tf.constant(generated[np.newaxis,:,:,np.newaxis])
    pool = 11
    paddings = [[0,0],[pool//2,pool//2],[pool//2,pool//2],[0,0]]
    gen_pad = tf.pad(gen_nhwc, paddings, "CONSTANT")
    res = tf.nn.avg_pool2d(gen_pad, (pool,pool), (1,1),"VALID")*pool*pool
    result = np.squeeze(res.numpy())
    printMatrixesToHeatMap([[generated, "input"],[result, "output"]], "name")
    

    Results in images :

    Input on the left, output on the right. The input shows an empty 32x32 matrix with one value at 1 at the position (7,16). The output shows an empty 32x32 matrix with a square of ones between position (2,11) and (12,21)


    Edit : I created an issue on Github regarding the problem.