I am new with working with PyTorch and wanted to make a simple autoencoder with 255x255 RGB images to play around with it, however the output shape isn't the same as the input shape.
Here's the model
class AutoEncoder(nn.Module):
def __init__(self) -> None:
super().__init__()
self.encoder = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(in_channels=32, out_channels=128, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
self.decoder = nn.Sequential(
nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding=1),
nn.ReLU(),
nn.ConvTranspose2d(in_channels=128, out_channels=32, kernel_size=3, output_padding=1),
nn.ReLU(),
nn.ConvTranspose2d(in_channels=32, out_channels=3, kernel_size=3, output_padding=1),
nn.Sigmoid()
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
And here are the shapes given by the torchsummary package
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 32, 255, 255] 896
ReLU-2 [-1, 32, 255, 255] 0
MaxPool2d-3 [-1, 32, 127, 127] 0
Conv2d-4 [-1, 128, 127, 127] 36,992
ReLU-5 [-1, 128, 127, 127] 0
MaxPool2d-6 [-1, 128, 63, 63] 0
Conv2d-7 [-1, 128, 63, 63] 147,584
ReLU-8 [-1, 128, 63, 63] 0
ConvTranspose2d-9 [-1, 32, 66, 66] 36,896
ReLU-10 [-1, 32, 66, 66] 0
ConvTranspose2d-11 [-1, 3, 69, 69] 867
Sigmoid-12 [-1, 3, 69, 69] 0
I have seen from another post that the output_padding
option in the decoder part would help with the output shape but it hasn't worked for me.
I don't know what the problem might be, coming from Tensorflow I would've used an Upscale layer but from what I've seen this isn't the way to do it in PyTorch.
Could anyone explain to me why my shapes are broken with my current model? Thanks
The main parameter controlling the upscaling of the input is stride=
. Setting stride=2
with kernel_size=2
will exactly double the input size.
In your case, use stride=2
with kernel_size=3
to get a doubling + 1 size transformation with each upconv layer. The first layer will produce an output sized 2 x 63 + 1 = 127
, and the second will yield 2 x 127 + 1 = 255
.
Example:
x = torch.rand(1, 128, 63, 63) #the ouput from Conv2d-7 is shaped (63, 63)
x = nn.ConvTranspose2d(128, 32, kernel_size=3, stride=2)(x) #2h + 1 upconv
print(x.shape)
#out> torch.Size([1, 32, 127, 127])
x = nn.ConvTranspose2d(32, 3, kernel_size=3, stride=2)(x) #2h + 1 upconv
print(x.shape)
#out> torch.Size([1, 3, 255, 255])