deep-learning pytorch image-segmentation unet-neural-network

Concatenation step of U-Net for unequal number of channels

I am trying to implement the U-NET architecture for image segmentation while implementing the crop and concatenation step in the expansive path, I am unable to understand how the unequal number of channels are concatenated.

According to the architecture, the input from the first upsampling step has to be concatenated from the corresponding output from contracting path but the problem is number of channels in contracting path is 512 while after upsampling step they are 1024, how they are supposed to be concatenated.My code for the crop and concatenate is -

def crop_and_concat(self, upsampled, bypass, crop=False):
    if crop:
        c = (bypass.size()[2] - upsampled.size()[2]) // 2
        bypass = F.pad(bypass, (-c, -c, -c, -c))
    return torch.cat((upsampled, bypass), 1)

The error I am receiving- RuntimeError: Given groups=1, weight of size 128 256 5 5, expected input[4, 384, 64, 64] to have 256 channels, but got 384 channels instead
Where I am doing wrong?

Solution

First of all, you don't have to be so strict when it comes to U-Net like architectures, there were many derivatives afterwards (see for example fastai variation with PixelShuffle).

In the case of encoder, in the basic version, your channels go (per block):

1 - 64 - 128 - 256 - 512

Standard convolutional encoder. After that is a shared layer of 1024.

In decoder, it goes downwards, but has more channels as you are concatenating encoder states from each block.

It would be:

1024 -> 512 -> 512 (decoder) + 512 (encoder), 1024 total -> 512

512 -> 256 -> 256 (decoder) + 256 (encoder), 512 total -> 256

and so on.

You were at the case where 256 from decoder was taken in the account, but 128 added from encoder wasn't. Just up your channels to 256 + 128 and follow the above scheme for each block of your UNet.