I am reading this research paper (https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf) and trying to follow along with the code on Github. I don't understand how the parameters for the nn.Conv2d() were determined. For the first Conv2d: Does 64@96*96 mean 64 channels with a 96 x 96 kernel size? And if so then why is the kernel size 10 in the function? I have googled the parameters and their meanings and from what I read I understand that its (input_channels, output_channels, kernel_size)
Here is the github post: https://github.com/fangpin/siamese-pytorch/blob/master/train.py
For reference page 4 of the research paper has the model schematic.
self.conv = nn.Sequential(
nn.Conv2d(1, 64, 10), # 64@96*96
nn.ReLU(inplace=True),
nn.MaxPool2d(2), # 64@48*48
nn.Conv2d(64, 128, 7),
nn.ReLU(), # 128@42*42
nn.MaxPool2d(2), # 128@21*21
nn.Conv2d(128, 128, 4),
nn.ReLU(), # 128@18*18
nn.MaxPool2d(2), # 128@9*9
nn.Conv2d(128, 256, 4),
nn.ReLU(), # 256@6*6
)
self.liner = nn.Sequential(nn.Linear(9216, 4096), nn.Sigmoid())
self.out = nn.Linear(4096, 1)
If you look at the model schematic, it's showing two things,
nn.Conv2D
op)For example first conv2d
layer is 64@10x10, meaning 64 output channels and a 10x10 kernel.
Whereas the feature map is 64@96x96
, which comes from applying 64@10x10
convolution op on 105x105x1
sized input. This way you get 64 output channels and a 105-10+1=96
sized width and height.