Search code examples
pythonpytorchconv-neural-networkartificial-intelligence

How to determine parameters for nn.Conv2d()


I am reading this research paper (https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf) and trying to follow along with the code on Github. I don't understand how the parameters for the nn.Conv2d() were determined. For the first Conv2d: Does 64@96*96 mean 64 channels with a 96 x 96 kernel size? And if so then why is the kernel size 10 in the function? I have googled the parameters and their meanings and from what I read I understand that its (input_channels, output_channels, kernel_size)

Here is the github post: https://github.com/fangpin/siamese-pytorch/blob/master/train.py

For reference page 4 of the research paper has the model schematic.

       self.conv = nn.Sequential(
            nn.Conv2d(1, 64, 10),  # 64@96*96
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),  # 64@48*48
            nn.Conv2d(64, 128, 7),
            nn.ReLU(),    # 128@42*42
            nn.MaxPool2d(2),   # 128@21*21
            nn.Conv2d(128, 128, 4),
            nn.ReLU(), # 128@18*18
            nn.MaxPool2d(2), # 128@9*9
            nn.Conv2d(128, 256, 4),
            nn.ReLU(),   # 256@6*6
        )
        self.liner = nn.Sequential(nn.Linear(9216, 4096), nn.Sigmoid())
        self.out = nn.Linear(4096, 1)

Solution

  • If you look at the model schematic, it's showing two things,

    • Parameters of the convolution kernel,
    • Parameters of the feature maps (output of the nn.Conv2D op)

    For example first conv2d layer is 64@10x10, meaning 64 output channels and a 10x10 kernel.

    Whereas the feature map is 64@96x96, which comes from applying 64@10x10 convolution op on 105x105x1 sized input. This way you get 64 output channels and a 105-10+1=96 sized width and height.