Search code examples
machine-learningpytorchneural-networkautoencoderactivation-function

What is the purpose of having the same input and output in PyTorch nn.Linear function?


I think this is a comprehension issue, but I would appreciate any help. I'm trying to learn how to use PyTorch for autoencoding. In the nn.Linear function, there are two specified parameters, nn.Linear(input_size, hidden_size)

When reshaping a tensor to its minimum meaningful representation, as one would in autoencoding, it makes sense that the hidden_size would be smaller. However, in the PyTorch tutorial there is a line specifying identical input_size and hidden_size:

class NeuralNetwork(nn.Module):
def __init__(self):
    super(NeuralNetwork, self).__init__()
    self.flatten = nn.Flatten()
    self.linear_relu_stack = nn.Sequential(
        nn.Linear(28*28, 512),
        nn.ReLU(),
        nn.Linear(512, 512),
        nn.ReLU(),
        nn.Linear(512, 10),
    )

I guess my question is, what is the purpose of having the same input and hidden size? Wouldn't this just return an identical tensor?

I suspect that this just a requirement after calling the nn.ReLU() activation function.


Solution

  • As well stated by wikipedia:

    An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. The encoding is validated and refined by attempting to regenerate the input from the encoding.

    In other words, the idea of the autoencoder is to learn an identity. This identity-function will be learned only for particular inputs (i.e. without anomalies). From this, the following points derive:

    1. Input will have same dimensions as output
    2. Autoencoders are (generally) built to learn the essential features of the input

    Because of point (1), you have that autoencoder will have a series of layers (e.g. a series of nn.Linear() or nn.Conv()). Because of point (2), you generally have an Encoder which compresses the information (as your code-snippet, you start from 28x28 to the ending 10) and a Decoder that decompress the information (10 -> 28x28). Generally the latent space dimensionality (10) is much smaller than the input (28x28) across several implementation of this theoretical architecture. Now that the end-goal of the Encoder part is clear, you may appreciate that the compression may produce additional data during the compression itself (nn.Linear(28*28, 512)), which will disappear when the series of layers will give the final output (10).