I am looking at the model implementation in PyTorch. The 1st layer is a convolutional layer with filter size = 7, stride = 2, pad = 3. The standard input size to the network is 224x224x3. Based on these numbers, the output dimensions are (224 + 3*2 - 7)/2 + 1, which is not an integer. Does the original implementation contain non-integer dimensions? I see that the network has adaptive pooling before the FC layer, so the variable input dimensions aren't a problem (I tested this by varying the input size). Am I doing something wrong, or why would the authors choose a non-integer dimension while designing the ResNet?
The dimensions always have to be integers. From nn.Conv2d
- Shape:
The brackets that are only closed towards the bottom denote the floor operation (round down). The calculation becomes:
import math
math.floor((224 + 3*2 - 7)/2 + 1) # => 112
# Or using the integer division (two slashes //)
(224 + 3*2 - 7) // 2 + 1 # => 112
Using an integer division has the same effect, since that always rounds it down to the nearest integer.