python pytorch conv-neural-network artificial-intelligence

How to determine parameters for nn.Conv2d()

I am reading this research paper (https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf) and trying to follow along with the code on Github. I don't understand how the parameters for the nn.Conv2d() were determined. For the first Conv2d: Does 64@96*96 mean 64 channels with a 96 x 96 kernel size? And if so then why is the kernel size 10 in the function? I have googled the parameters and their meanings and from what I read I understand that its (input_channels, output_channels, kernel_size)

Here is the github post: https://github.com/fangpin/siamese-pytorch/blob/master/train.py

For reference page 4 of the research paper has the model schematic.

       self.conv = nn.Sequential(
            nn.Conv2d(1, 64, 10),  # 64@96*96
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),  # 64@48*48
            nn.Conv2d(64, 128, 7),
            nn.ReLU(),    # 128@42*42
            nn.MaxPool2d(2),   # 128@21*21
            nn.Conv2d(128, 128, 4),
            nn.ReLU(), # 128@18*18
            nn.MaxPool2d(2), # 128@9*9
            nn.Conv2d(128, 256, 4),
            nn.ReLU(),   # 256@6*6
        )
        self.liner = nn.Sequential(nn.Linear(9216, 4096), nn.Sigmoid())
        self.out = nn.Linear(4096, 1)

Solution

If you look at the model schematic, it's showing two things,

Parameters of the convolution kernel,
Parameters of the feature maps (output of the nn.Conv2D op)

For example first conv2d layer is 64@10x10, meaning 64 output channels and a 10x10 kernel.

Whereas the feature map is 64@96x96, which comes from applying 64@10x10 convolution op on 105x105x1 sized input. This way you get 64 output channels and a 105-10+1=96 sized width and height.