Search code examples
pythondeep-learningpytorchautoencoder

Decoder upsample size


I have a simple autoencoder which looks like this:

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()

        self.channels = [3, 8, 16, 32]

        self.encoder = nn.Sequential(
                # layer 1
                nn.Conv2d(self.channels[0], self.channels[1], 3, 1, 1),
                nn.BatchNorm2d(self.channels[1]),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=4, stride=4),
                # layer 2
                nn.Conv2d(self.channels[1], self.channels[2], 3, 1, 1),
                nn.BatchNorm2d(self.channels[2]),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=4, stride=4),
                # layer 3
                nn.Conv2d(self.channels[2], self.channels[3], 3, 1, 1),
                nn.BatchNorm2d(self.channels[3]),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
        
        self.decoder = nn.Sequential(
                nn.Conv2d(self.channels[3], self.channels[2], 3, 1, 1),                
                nn.ReLU(inplace=True),
                nn.Upsample(size=(15, 20), mode='bilinear', align_corners=False),

                nn.Conv2d(self.channels[2], self.channels[1], 3, 1, 1),                
                nn.ReLU(inplace=True),
                nn.Upsample(scale_factor=4, mode='bilinear', align_corners=False),

                nn.Conv2d(self.channels[1], 1, 1),                
                nn.ReLU(inplace=True),
                nn.Upsample(scale_factor=4, mode='bilinear', align_corners=False)
            )

    def forward(self, x):
        h = self.encoder(x)
        y = self.decoder(h)
        return y

Can someone clarify if the nn.Upsample(size=(15, 20)... in the Decoder has something to do with restoring the images to their original dimensions? My input images(torch tensors) are of size 240*320 and the network is supposed to restore the input tensor to its original size.

Also if I add a fourth layer to my encoder with 64 channels and a 4th layer to my decoder (similar to the layers above), what should the upsampling size be in the first layere of the decode?


Solution

  • As mentioned in PyTorch Documentation, You may define an upsampling layer with a scale factor or an output size. Stating output_size=(H, W) will make sure the output size will be (H, W), regardless of the input size. On the other hand, stating scale_factor=4 will make sure the output of each spatial dimension will be 4 times the input, e.g. (30, 40) -> (120, 160).

    You want to mention you need to get an output of size 240 * 320. Therefore, the spatial dimensions of your output from the decoder will be (15*4*4, 20*4*4) which conveniently is (240, 320).