python tensorflow keras conv-neural-network mobilenet

Conv2dTranspose produces the wrong output shape

I am currently trying to modify mobilenetv2 so that it detects certain objects in an image and returns a heatmap that marks the positions of said objects. For that it is necessary, that the heatmap has the exact same resolution as the input image.

My approach is to build some kind of U-Net like encoder-decoder network that utilizes Conv2dTranspose to scale the output of the mobile net back to it's original shape with shortcut paths to each corresponding convolution that decreases the resolution.

The first concatenation between the first corresponding layers works well, the second however fails, as the shapes of their outputs don't match. The first Conv2dTranspose increases the resolution by the factor 2, as I anticipated. The second one however does not. It has the input shape (None, 20, 80, 192) and is supposed to output (None, 40, 160, 144). Unfortunately the actual output shape turns out to be (None, 36, 156, 144), making a concatenation of the layers impossible.

How can I achieve a consistent output shapes? I thought that is what padding='same' was supposed to guarantee? Help is much appreciated!

So far I have tried changing the padding type, setting the output_padding parameter, stride and filter size. None of which more or less surprisingly did affect the output shape in the desired way.

base_model = MobileNetV2(input_shape=(imageShape[0], 
    imageShape[1], 3), include_top=False, weights='imagenet')
conv_layers = get_conv_layers(base_model)

x = base_model.output

c = conv_layers.pop()
c = conv_layers.pop()
x = Conv2DTranspose(filters=c.output_shape[-1],
                    kernel_size=(3, 3), strides=(2, 2), 
                    activation='relu', padding='same', 
                    kernel_initializer='he_normal')(x)
x = concatenate([c.output, x], axis=-1)
x = Conv2D(filters=c.output_shape[-1], kernel_size=(3, 3),
           activation='relu')(x)

c = conv_layers.pop()
x = Conv2DTranspose(filters=c.output_shape[-1],
                    kernel_size=(3, 3), strides=(2, 2), 
                    activation='relu', padding='same',
                    kernel_initializer='he_normal')(x)
x = concatenate([c.output, x], axis=-1)
x = Conv2D(filters=c.output_shape[-1], kernel_size=(3, 3),
           activation='relu')(x)

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 40, 160, 144), (None, 36, 156, 144)]

The first shape is the desired shape of the output of Conv2dTransposed, the second one the actual. These ought to be the same for the concatenation to work.

Solution

Ok so I got it figured out, sometimes you simply have to step away from a problem for some time. Turns out I was so focused on Conv2dTranspose being the culprit that I've completely overlooked that there are other layers in between that could cause the issue. After all I forgot to set the padding of the normal Conv2d to 'same'. Setting this parameter correctly solved the problem and I get the expected output shape.