I'm training a segmentation model with the U-net architecture. The input image size is 250x250.
Currently, I've manually tweaked the paddings of some of the convolutional layers to ensure that the model output is of the same size, i.e. 250x250.
But when I input a differently sized image, for example a 500x500 one, the output size is 506x506.
How do I make sure the output size remains the same as input for all sizes?
You can use "Crop"
layer to force the output shape to be identical.
With a U-net I suggest using a crop layer after every upsampling and not only at the end, to avoid accumulating padding errors.
Regarding "padding errors":
Suppose you have an input of shape 100x100, you down sample it 3 times by a factor if 2, you'll end up with 13x13 feature map.
Now, if you upsample three times by x2 each time
13x13 --> 26x26 --> 52x52 --> 104x104
You have "additional" 4 pixels that were added due to padding/rounding (in your question you have 6).
However, if you "Crop"
after each upsample
13x13 --> 26x26 \crop 25x25 --> 50x50 --> 100x100
You see that only after the first upsample there is a non-trivial crop, and it only has 1 pixel error at that level, and not 4.