I have a simple autoencoder which looks like this:
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder, self).__init__()
self.channels = [3, 8, 16, 32]
self.encoder = nn.Sequential(
# layer 1
nn.Conv2d(self.channels[0], self.channels[1], 3, 1, 1),
nn.BatchNorm2d(self.channels[1]),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
# layer 2
nn.Conv2d(self.channels[1], self.channels[2], 3, 1, 1),
nn.BatchNorm2d(self.channels[2]),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
# layer 3
nn.Conv2d(self.channels[2], self.channels[3], 3, 1, 1),
nn.BatchNorm2d(self.channels[3]),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.decoder = nn.Sequential(
nn.Conv2d(self.channels[3], self.channels[2], 3, 1, 1),
nn.ReLU(inplace=True),
nn.Upsample(size=(15, 20), mode='bilinear', align_corners=False),
nn.Conv2d(self.channels[2], self.channels[1], 3, 1, 1),
nn.ReLU(inplace=True),
nn.Upsample(scale_factor=4, mode='bilinear', align_corners=False),
nn.Conv2d(self.channels[1], 1, 1),
nn.ReLU(inplace=True),
nn.Upsample(scale_factor=4, mode='bilinear', align_corners=False)
)
def forward(self, x):
h = self.encoder(x)
y = self.decoder(h)
return y
Can someone clarify if the nn.Upsample(size=(15, 20)...
in the Decoder has something to do with restoring the images to their original dimensions? My input images(torch tensors) are of size 240*320 and the network is supposed to restore the input tensor to its original size.
Also if I add a fourth layer to my encoder with 64 channels and a 4th layer to my decoder (similar to the layers above), what should the upsampling size be in the first layere of the decode?
As mentioned in PyTorch Documentation, You may define an upsampling layer with a scale factor or an output size. Stating output_size=(H, W)
will make sure the output size will be (H, W)
, regardless of the input size. On the other hand, stating scale_factor=4
will make sure the output of each spatial dimension will be 4 times the input, e.g. (30, 40) -> (120, 160)
.
You want to mention you need to get an output of size 240 * 320
. Therefore, the spatial dimensions of your output from the decoder will be (15*4*4, 20*4*4)
which conveniently is (240, 320)
.