Search code examples
tensorflowpaddingconvolutionautoencoder

Finding the amount of zeros to pad the input of a convolutional layer


I am using these these sources to build a convolutional autoencoder in tensorflow. I understand that I need to pad my input image with zeros, in order to get an output from the decoder equal to the original input. The author is giving an example for the simple case of a square kernel and equal values for the strides (vertical and horrizontal). I need to generalize this padding function for my input, however I fail to get the correct shape of my tensor. My function so far is:

def _pad(self, input_x, filter_height, filter_width):
    """
    pads input_x with the right amount of zeros.
    Args:
        input_x: 4-D tensor, [batch_side, widht, height, depth]
        filter_side: used to dynamically determine the padding amount
    Returns:
        input_x padded
    """
    # calculate the padding amount for each side
    top_bottom_padding = filter_height - 1
    left_right_padding = filter_width - 1

    # pad the input on top, bottom, left, right, with amount zeros
    return tf.pad(input_x,
                  [[0, 0], [top_bottom_padding, top_bottom_padding], [left_right_padding, left_right_padding], [0, 0]])

This gives me

Shape of input:  (10, 161, 1800, 1)
Shape of padded input: (10, 187, 1826, 1)
Shape of encoder output:  (10, 187, 913, 15)
Shape of decoder output:  (10, 187, 457, 15)

for

num_outputs=15, kernel_size=14, stride=[1,2]

Any idea on what I'm doing wrong?


Solution

  • The function you use does not take into account strides. Actually it just decrements by 1 your initial input. For 1D case, knowing the input size i, kernel size k, stride s and padding p you can calculate the output size of the convolution as:

    enter image description here

    Here || operator means ceiling operation. Knowing the math for a 1-dim case, n-dim case is easy once you see that each dim is independent. So you just slide each dimension separately.


    Looking at the formula, and knowing that your o should be equal to i, you can calculate the appropriate padding.