Search code examples
pythondeep-learningpytorchimage-segmentationtorchvision

RuntimeError: Given groups=1, weight of size [64, 64, 1, 1], expected input[4, 1, 1080, 1920] to have 64 channels, but got 1 channels instead


I want to train a U-net segmentation model on the German Asphalt Pavement Distress (GAPs) dataset using U-Net. I'm trying to modify the model at https://github.com/khanhha/crack_segmentation to train on that dataset.

Here is the folder containing all the related files and folders: https://drive.google.com/drive/folders/14NQdtMXokIixBJ5XizexVECn23Jh9aTM?usp=sharing

I modified the training file, and renamed it as "train_unet_GAPs.py". When I try to train on Colab using the following command:

!python /content/drive/Othercomputers/My\ Laptop/crack_segmentation_khanhha/crack_segmentation-master/train_unet_GAPs.py -data_dir "/content/drive/Othercomputers/My Laptop/crack_segmentation_khanhha/crack_segmentation-master/GAPs/" -model_dir /content/drive/Othercomputers/My\ Laptop/crack_segmentation_khanhha/crack_segmentation-master/model/ -model_type resnet101

I get the following error:

total images = 2410
create resnet101 model
Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /root/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth
100% 171M/171M [00:00<00:00, 212MB/s]
Started training model from epoch 0
Epoch 0:   0% 0/2048 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/content/drive/Othercomputers/My Laptop/crack_segmentation_khanhha/crack_segmentation-master/train_unet_GAPs.py", line 259, in <module>
    train(train_loader, model, criterion, optimizer, validate, args)
  File "/content/drive/Othercomputers/My Laptop/crack_segmentation_khanhha/crack_segmentation-master/train_unet_GAPs.py", line 118, in train
    masks_pred = model(input_var)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/drive/Othercomputers/My Laptop/crack_segmentation_khanhha/crack_segmentation-master/unet/unet_transfer.py", line 224, in forward
    conv2 = self.conv2(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torchvision/models/resnet.py", line 144, in forward
    out = self.conv1(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 444, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 64, 1, 1], expected input[4, 1, 1080, 1920] to have 64 channels, but got 1 channels instead
Epoch 0:   0% 0/2048 [00:08<?, ?it/s]

I think that this is because the images of GAPs dataset are grayscale images (with one channel), while Resnet expects to receive RGB images with 3 channels.

How can I solve this issue? How can I modify the model to receive grayscale images instead of RGB images? I need help with that. I have no experience with torch, and I think this implementation uses built-in Resnet model.


Solution

  • I figured out few things with your code.

    According to the trace back, you are using a resnet based Unet model.

    Your current model forward method is defined as :

    def forward(self, x):
        #conv1 = self.conv1(x)
        #conv2 = self.conv2(conv1)
        conv2 = self.conv2(x)
        conv3 = self.conv3(conv2)
        conv4 = self.conv4(conv3)
        conv5 = self.conv5(conv4)
        ...
    

    Your error comes from self.conv2(x), because, conv2 takes a matrix with a number of channels of 64. It means, something is missing, or.. commented :)

    By changing

        #conv1 = self.conv1(x)
        #conv2 = self.conv2(conv1)
        conv2 = self.conv2(x)
    

    into

        conv1 = self.conv1(x)
        conv2 = self.conv2(conv1) 
    

    Will fix the problem the problem of 64 channels as input. But, there is another problem :

    Using an input of (B,1,H,W), no matters what B, H and W are, won't be possible with your current architecture. Why ? Because of this :

    resnet34 = torchvision.models.resnet34(pretrained=False)
    resnet101 = torchvision.models.resnet101(pretrained=False)
    resnet152 = torchvision.models.resnet152(pretrained=False)
    
    print(resnet34.conv1)
    -> Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    
    print(resnet101.conv1)
    -> Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    
    print(resnet152.conv1)
    -> Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    

    In any case, the layer conv1 of resnet, takes a 3 channels input.

    Once you have made those modifications, you should also try your network with a dummy example like :

    model = UNetResNet(34,num_classes=2)
    out = model(torch.rand(4,3,1920,1920))
    print(out.shape)
    -> (4,2,1920,1920) | (batch_size, num_classes, H, W)
    

    Why your width and height are the same here ? Because your current architecture only supports squared images.

    For example :

    -> (1080,1920) = dim mismatching during concatenation part
    -> (1920,1920) = success
    -> (108,192) = dim mismatching during concatenation part
    -> (192,192) = success
    

    Conclusion :

    • Modify your network to accept grayscale images if your dataset is made of grayscale images.
    • Preprocess your images to make Width=Height.

    Edit (device mismatch) :

    class UNetResNet(nn.Module):
    
        def __init__(self, encoder_depth, num_classes, num_filters=32, dropout_2d=0.2,
                     pretrained=False, is_deconv=False):
            super().__init__()
            self.num_classes = num_classes
            self.dropout_2d = dropout_2d
    
            if encoder_depth == 34:
                self.encoder = torchvision.models.resnet34(pretrained=pretrained)
                bottom_channel_nr = 512
            elif encoder_depth == 101:
                self.encoder = torchvision.models.resnet101(pretrained=pretrained)
                bottom_channel_nr = 2048
            elif encoder_depth == 152:
                self.encoder = torchvision.models.resnet152(pretrained=pretrained)
                bottom_channel_nr = 2048
            else:
                raise NotImplementedError('only 34, 101, 152 version of Resnet are implemented')
    
            self.pool = nn.MaxPool2d(2, 2)
    
            self.relu = nn.ReLU(inplace=True)
    
            #self.conv1 = nn.Sequential(self.encoder.conv1,
            #                           self.encoder.bn1,
            #                           self.encoder.relu,
            #                           self.pool)
    
            self.conv1 = nn.Sequential(nn.Conv2d(1,64,kernel_size=(7,7),stride=(2,2),padding=(3,3),bias=False), # 1 Here is for grayscale images, replace by 3 if you need RGB/BGR
                                       nn.BatchNorm2d(64),
                                       nn.ReLU(),
                                       self.pool
                                    )
            
            self.conv2 = self.encoder.layer1
    
            self.conv3 = self.encoder.layer2
    
            self.conv4 = self.encoder.layer3
    
            self.conv5 = self.encoder.layer4
    
            self.center = DecoderBlockV2(bottom_channel_nr, num_filters * 8 * 2, num_filters * 8, is_deconv)
            self.dec5 = DecoderBlockV2(bottom_channel_nr + num_filters * 8, num_filters * 8 * 2, num_filters * 8, is_deconv)
            self.dec4 = DecoderBlockV2(bottom_channel_nr // 2 + num_filters * 8, num_filters * 8 * 2, num_filters * 8,
                                       is_deconv)
            self.dec3 = DecoderBlockV2(bottom_channel_nr // 4 + num_filters * 8, num_filters * 4 * 2, num_filters * 2,
                                       is_deconv)
            self.dec2 = DecoderBlockV2(bottom_channel_nr // 8 + num_filters * 2, num_filters * 2 * 2, num_filters * 2 * 2,
                                       is_deconv)
            self.dec1 = DecoderBlockV2(num_filters * 2 * 2, num_filters * 2 * 2, num_filters, is_deconv)
            self.dec0 = ConvRelu(num_filters, num_filters)
            self.final = nn.Conv2d(num_filters, num_classes, kernel_size=1)
    
        def forward(self, x):
            conv1 = self.conv1(x)
            conv2 = self.conv2(conv1)
            conv3 = self.conv3(conv2)
            conv4 = self.conv4(conv3)
            conv5 = self.conv5(conv4)
    
            pool = self.pool(conv5)
            center = self.center(pool)
    
            dec5 = self.dec5(torch.cat([center, conv5], 1))
    
            dec4 = self.dec4(torch.cat([dec5, conv4], 1))
            dec3 = self.dec3(torch.cat([dec4, conv3], 1))
            dec2 = self.dec2(torch.cat([dec3, conv2], 1))
            dec1 = self.dec1(dec2)
            dec0 = self.dec0(dec1)
    
            return self.final(F.dropout2d(dec0, p=self.dropout_2d))