Search code examples
pythontensorflowdeep-learningtensorboard

what is the effect of `tf.nn.max_pool(input_tensor, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")` on an input tensor shape?


I am studying tensorboard code from Dandelion Mane specificially: https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial/blob/master/mnist.py

His convolution layer is specifically defined as:

def conv_layer(input, size_in, size_out, name="conv"):
  with tf.name_scope(name):
    w = tf.Variable(tf.truncated_normal([5, 5, size_in, size_out], stddev=0.1), name="W")
    b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="B")
    conv = tf.nn.conv2d(input, w, strides=[1, 1, 1, 1], padding="SAME")
    act = tf.nn.relu(conv + b)
    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("activations", act)
    return tf.nn.max_pool(act, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

I am trying to work out what is the effect of the maxpool on the input tensor size. As far as I can tell it seems to about halve the middle two dimension sizes, and sometimes there is a +1 in there. Perhaps when the dimensions are odd valued.

For example, ?x188x141x32 input becomes ?x94x71x32

And I also see that: ?x47x36x128 becomes ?x24x18x128

So, is the resultant size for input: [a,b,c,d] the output size of [a,(b+1)//2,(c+1)//2,d]?

Would it be correct to think that the first dimension does not change?

Is there a general way to write the input and output sizes based on kernel and stride size?


Solution

  • The specific clue is in the strides parameter: this determines how many cells the kernel will shift on each iteration. Since the two match, your "resultant size" computation is correct as far as it goes. For each dimension, the formula is

    ceil( n/stride )
    

    In short, divide and round up. Your given stride vector is (1, 2, 2, 1), so you have a denominator of 1 for a and d. For the middle dimensions, (n+1)//2 is equivalent to the ceil function.

    If the kernel doesn't match the stride, you need to adjust n/stride to allow for the size difference. The actual figure is the number of strides you can take in that direction before the opposite side of the kernel reaches the layer's other side.

    k = kernel size in that dimension
    n = layer  size in that dimension
    new_size = 1 + (n-k) // stride
    

    I hope I got my boundary condition correct in that last line ...