Search code examples
pythonmachine-learningtensorflowconv-neural-networkdeconvolution

Transpose convolution (deconvolution) arithmetic


I am using tensorflow to construct a convolution neural network. Given a tensor of the shape (none, 16, 16, 4, 192) I want to perform a transpose convolution that results in the shape (none, 32, 32, 7, 192).

Would a filter size of [2,2,4,192,192] and stride of [2,2,1,1,1] produce the output shape that I want?


Solution

  • Yes, you are almost right.

    One minor correction is that tf.nn.conv3d_transpose expects NCDHW or NDHWC input format (yours appears to be NHWDC) and the filter shape is expected to be [depth, height, width, output_channels, in_channels]. This affects the order of dimensions in the filter and stride:

    # Original format: NHWDC.
    original = tf.placeholder(dtype=tf.float32, shape=[None, 16, 16, 4, 192])
    print original.shape
    
    # Convert to NDHWC format.
    input = tf.reshape(original, shape=[-1, 4, 16, 16, 192])
    print input.shape
    
    # input shape:  [batch, depth, height, width, in_channels].
    # filter shape: [depth, height, width, output_channels, in_channels].
    # output shape: [batch, depth, height, width, output_channels].
    filter = tf.get_variable('filter', shape=[4, 2, 2, 192, 192], dtype=tf.float32)
    conv = tf.nn.conv3d_transpose(input,
                                  filter=filter,
                                  output_shape=[-1, 7, 32, 32, 192],
                                  strides=[1, 1, 2, 2, 1],
                                  padding='SAME')
    print conv.shape
    
    final = tf.reshape(conv, shape=[-1, 32, 32, 7, 192])
    print final.shape
    

    Which outputs:

    (?, 16, 16, 4, 192)
    (?, 4, 16, 16, 192)
    (?, 7, 32, 32, 192)
    (?, 32, 32, 7, 192)