Deconv implementation in keras output_shape issue

I am implementing following Colorization Model written in Caffe. I am confused about my output_shape parameter to supply in Keras

model.add(Deconvolution2D(256,4,4,border_mode='same',
output_shape=(None,3,14,14),subsample=(2,2),dim_ordering='th',name='deconv_8.1'))

I have added a dummy output_shape parameter. But how can I determine the output parameter? In caffe model the layer is defined as:

layer {
 name: "conv8_1"
  type: "Deconvolution"
  bottom: "conv7_3norm"
  top: "conv8_1"
  convolution_param {
    num_output: 256
    kernel_size: 4
    pad: 1
    dilation: 1
    stride: 2
  }

If I do not supply this parameter the code give parameter error but I can not understand what should I supply as output_shape

p.s. already asked on data science forum page with no response. may be due to small user base

Solution

What output shape does the Caffe deconvolution layer produce?

For this colorization model in particular you can simply refer to page 24 of their paper (which is linked in their GitHub page):

So basically the output shape of this deconvolution layer in the original model is [None, 56, 56, 128]. This is what you want to pass to Keras as output_shape. The only problem is as I mention in the section below, Keras doesn't really use this parameter to determine the output shape, so you need to run a dummy prediction to find what your other parameters need to be in order for you to get what you want.

More generally the Caffe source code for computing its Deconvolution layer output shape is:

    const int kernel_extent = dilation_data[i] * (kernel_shape_data[i] - 1) + 1;
    const int output_dim = stride_data[i] * (input_dim - 1)
    + kernel_extent - 2 * pad_data[i];

Which with a dilation argument equal to 1 reduces to just:

    const int output_dim = stride_data[i] * (input_dim - 1)
    + kernel_shape_data[i] - 2 * pad_data[i];

Note that this matches the Keras documentation when the parameter a is zero:

Formula for calculation of the output shape 3, 4: o = s (i - 1) + a + k - 2p

How to verify actual output shape with your Keras backend

This is tricky, because the actual output shape depends on the backend implementation and configuration. Keras is currently unable to find it on its own. So you actually have to execute a prediction on some dummy input to find the actual output shape. Here's an example of how to do this from the Keras docs for Deconvolution2D:

To pass the correct `output_shape` to this layer,
one could use a test model to predict and observe the actual output shape.
# Examples
```python
    # apply a 3x3 transposed convolution with stride 1x1 and 3 output filters on a 12x12 image:
    model = Sequential()
    model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 3, 14, 14), border_mode='valid', input_shape=(3, 12, 12)))
    # Note that you will have to change the output_shape depending on the backend used.
    # we can predict with the model and print the shape of the array.
    dummy_input = np.ones((32, 3, 12, 12))
    # For TensorFlow dummy_input = np.ones((32, 12, 12, 3))
    preds = model.predict(dummy_input)
    print(preds.shape)
    # Theano GPU: (None, 3, 13, 13)
    # Theano CPU: (None, 3, 14, 14)
    # TensorFlow: (None, 14, 14, 3)

Reference: https://github.com/fchollet/keras/blob/master/keras/layers/convolutional.py#L507

Also you might be curious to know why is it that the output_shape parameter apparently doesn't really define the output shape. According to the post Deconvolution2D layer in keras this is why:

Back to Keras and how the above is implemented. Confusingly, the output_shape parameter is actually not used for determining the output shape of the layer, and instead they try to deduce it from the input, the kernel size and the stride, while assuming only valid output_shapes are supplied (though it's not checked in the code to be the case). The output_shape itself is only used as input to the backprop step. Thus, you must also specify the stride parameter (subsample in Keras) in order to get the desired result (which could've been determined by Keras from the given input shape, output shape and kernel size).