machine-learning deep-learning computer-vision caffe image-segmentation

How to ensure Caffe segmentation network output size is the same as input?

I'm training a segmentation model with the U-net architecture. The input image size is 250x250.

Currently, I've manually tweaked the paddings of some of the convolutional layers to ensure that the model output is of the same size, i.e. 250x250.

But when I input a differently sized image, for example a 500x500 one, the output size is 506x506.

How do I make sure the output size remains the same as input for all sizes?

Solution

You can use "Crop" layer to force the output shape to be identical.
With a U-net I suggest using a crop layer after every upsampling and not only at the end, to avoid accumulating padding errors.

Regarding "padding errors":
Suppose you have an input of shape 100x100, you down sample it 3 times by a factor if 2, you'll end up with 13x13 feature map.
Now, if you upsample three times by x2 each time

13x13 --> 26x26 --> 52x52 --> 104x104

You have "additional" 4 pixels that were added due to padding/rounding (in your question you have 6).
However, if you "Crop" after each upsample

13x13 --> 26x26 \crop 25x25 --> 50x50 --> 100x100

You see that only after the first upsample there is a non-trivial crop, and it only has 1 pixel error at that level, and not 4.