Search code examples
pythonneural-networkconv-neural-networkpytorchautoencoder

Extracting reduced dimension data from autoencoder in pytorch


I have defined my autoencoder in pytorch as following:

self.encoder = nn.Sequential(
    nn.Conv2d(input_shape[0], 32, kernel_size=1, stride=1),
    nn.ReLU(),
    nn.Conv2d(32, 64, kernel_size=1, stride=1),
    nn.ReLU(),
    nn.Conv2d(64, 64, kernel_size=1, stride=1),
    nn.ReLU()
)

self.decoder = nn.Sequential(
    nn.Conv2d(64, 64, kernel_size=1, stride=1),
    nn.ReLU(),
    nn.Conv2d(64, 32, kernel_size=1, stride=1),
    nn.ReLU(),
    nn.Conv2d(32, input_shape[0], kernel_size=1, stride=1),
    nn.ReLU(),
    nn.Sigmoid()
)

I need to get a reduced dimension encoding which requires creating a new linear layer of the dimension N much lower than the image dimension so that I can extract the activations.

If anybody can help me with fitting a linear layer in the decoder part I would appreciate (i know how to Flatten() the data, but I guess I need to "unflatten" it again to interface with the Conv2d layer again)

Update:

I have come up with the following based on the first answer (it gives me a 8-dimensional bottleneck at the output of the encoder which works fine torch.Size([1, 8, 1, 1]) ).

self.encoder = nn.Sequential(
    nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
    nn.ReLU(),
    nn.Conv2d(32, 64, kernel_size=4, stride=2),
    nn.ReLU(),
    nn.Conv2d(64, 8, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.MaxPool2d(7, stride=1)
)

self.decoder = nn.Sequential(
    nn.ConvTranspose2d(8, 64, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.Conv2d(64, 32, kernel_size=4, stride=2),
    nn.ReLU(),
    nn.Conv2d(32, input_shape[0], kernel_size=8, stride=4),
    nn.ReLU(),
    nn.Sigmoid()
)

What I cannot do is train the autoencoder with

def forward(self, x):
    x = self.encoder(x)
    x = self.decoder(x)
    return x

The decoder gives me an error:

Calculated padded input size per channel: (3 x 3). Kernel size: (4 x 4). Kernel size can't be greater than actual input size

I would like to thank the person who provided the first answer.


Solution

  • In the decoder part, you need to upsample to larger size, which can be done via nn.ConvTranspose2d. I notice that in your encoder part, seems like you didn't downsample your feature maps, because your stride is alwasy 1. Here is a toy example.

    self.encoder = nn.Sequential(
                nn.Conv2d(32, 16, 3, stride=1, padding=1),  # b, 16, 32, 32
                nn.ReLU(True),
                nn.MaxPool2d(2, stride=2),  # b, 16, 16, 16
                nn.Conv2d(16, 32, 3, stride=1, padding=1),  # b, 32, 16, 16
                nn.ReLU(True),
                nn.MaxPool2d(2, stride=2)  # b, 32, 8, 8
            )
    self.decoder = nn.Sequential(
                nn.ConvTranspose2d(32, 16, 3, stride=2,padding=1,output_padding=1),  # b, 16, 16, 16
                nn.ReLU(True),
                nn.ConvTranspose2d(16, 1, 3, stride=2, padding=1, output_padding=1),  # b, 1, 32, 32
                nn.Sigmoid()
            )