Search code examples
pythonpytorchartificial-intelligence

PyTorch convolutional autoencoder, output dimensions different from input


I am new with working with PyTorch and wanted to make a simple autoencoder with 255x255 RGB images to play around with it, however the output shape isn't the same as the input shape.

Here's the model

class AutoEncoder(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        
        self.encoder = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(in_channels=32, out_channels=128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )

        self.decoder = nn.Sequential(
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(in_channels=128, out_channels=32, kernel_size=3, output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(in_channels=32, out_channels=3, kernel_size=3, output_padding=1),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

And here are the shapes given by the torchsummary package

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 32, 255, 255]             896
              ReLU-2         [-1, 32, 255, 255]               0
         MaxPool2d-3         [-1, 32, 127, 127]               0
            Conv2d-4        [-1, 128, 127, 127]          36,992
              ReLU-5        [-1, 128, 127, 127]               0
         MaxPool2d-6          [-1, 128, 63, 63]               0
            Conv2d-7          [-1, 128, 63, 63]         147,584
              ReLU-8          [-1, 128, 63, 63]               0
   ConvTranspose2d-9           [-1, 32, 66, 66]          36,896
             ReLU-10           [-1, 32, 66, 66]               0
  ConvTranspose2d-11            [-1, 3, 69, 69]             867
          Sigmoid-12            [-1, 3, 69, 69]               0

I have seen from another post that the output_padding option in the decoder part would help with the output shape but it hasn't worked for me.

I don't know what the problem might be, coming from Tensorflow I would've used an Upscale layer but from what I've seen this isn't the way to do it in PyTorch.

Could anyone explain to me why my shapes are broken with my current model? Thanks


Solution

  • The main parameter controlling the upscaling of the input is stride=. Setting stride=2 with kernel_size=2 will exactly double the input size.

    In your case, use stride=2 with kernel_size=3 to get a doubling + 1 size transformation with each upconv layer. The first layer will produce an output sized 2 x 63 + 1 = 127, and the second will yield 2 x 127 + 1 = 255.

    Example:

    x = torch.rand(1, 128, 63, 63) #the ouput from Conv2d-7 is shaped (63, 63)
    
    x = nn.ConvTranspose2d(128, 32, kernel_size=3, stride=2)(x) #2h + 1 upconv
    print(x.shape)
    #out> torch.Size([1, 32, 127, 127])
    
    x = nn.ConvTranspose2d(32, 3, kernel_size=3, stride=2)(x) #2h + 1 upconv
    print(x.shape)
    #out> torch.Size([1, 3, 255, 255])