Search code examples
neural-networkpytorchconv-neural-networkdeep-residual-networks

Resnet18 first layer output dimensions


I am looking at the model implementation in PyTorch. The 1st layer is a convolutional layer with filter size = 7, stride = 2, pad = 3. The standard input size to the network is 224x224x3. Based on these numbers, the output dimensions are (224 + 3*2 - 7)/2 + 1, which is not an integer. Does the original implementation contain non-integer dimensions? I see that the network has adaptive pooling before the FC layer, so the variable input dimensions aren't a problem (I tested this by varying the input size). Am I doing something wrong, or why would the authors choose a non-integer dimension while designing the ResNet?


Solution

  • The dimensions always have to be integers. From nn.Conv2d - Shape:

    Conv2d Shape

    The brackets that are only closed towards the bottom denote the floor operation (round down). The calculation becomes:

    import math
    
    math.floor((224 + 3*2 - 7)/2 + 1) # => 112
    
    # Or using the integer division (two slashes //)
    (224 + 3*2 - 7) // 2 + 1 # => 112
    

    Using an integer division has the same effect, since that always rounds it down to the nearest integer.