Unexpected number of fully connected neurons after padding with stride in pytorch

I'm trying to replicate the procedure of this paper (Re-)Imag(in)ing Price Trends, that trains a 2dCNN based on OHLC charts. They have images of different dimensions (32x15, 64x60 and 96x180) corresponding to 5, 20 and 60 daily bars and thus three dimension-specific architectures. But I end up with a different number of neurons compared to them in the fully connected layer for the 20day horizon (64x60)...

I followed the specification of their architecture that can be summarised as:

Number of blocks: (32x15): 2; (64x60):3 and (96x180):4
Fixed number of filters in 1st block: 64, else number doubles each convolutional block
5x3 (kernel size) convolutional filters (for all image types)
2x1 max-pooling filters (for all image types)
vertical stride of 1, 3, and 3 (only in first layer) for 32x15, 64x60 and 96x180 respectively
vertical dilution rate of 1, 2, and 3 (only in the first layer) for 32x15, 64x60 and 96x180 respectively
padding such that output has SAME dimension as the image itself

I suspect that my issue has something with the workaround for padding="same" in pytorch with asymmetric strides. Since strides can be >1, I went for the Conv2d workaround provided in this answer.

According to their paper (see figure), the CNN for 20 days horizon (64x60) should end up having 46080 neurons in FC layer. Below is the code for my architecture that gets the following error when resizing.

RuntimeError: shape '[-1, 46080]' is invalid for input of size 30720:

Clearly, the dimensions are incorrect and sensitive to changing the padding calculations. I cannot seem to get this right, not sure how I would else figure out padding for each block in each specific model... Hope someone can help me out. Thanks in advance.

import torch
from torch import nn
import math
from functools import reduce
from operator import __add__
import torch.nn.functional as F

class Conv2dSame(nn.Conv2d):
"""
https://github.com/pytorch/captum/blob/optim-wip/captum/optim/models/_common.py#L144 
"""

    def calc_same_pad(self, i: int, k: int, s: int, d: int) -> int:
        pad = max((math.ceil(i / s) - 1) * s + (k - 1) * d + 1 - i, 0)
        return pad

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        
        ih, iw = x.size()[-2:]
        kh, kw = self.weight.size()[-2:]
        pad_h = self.calc_same_pad(i=ih, k=kh, s=self.stride[0], d=self.dilation[0])
        pad_w = self.calc_same_pad(i=iw, k=kw, s=self.stride[1], d=self.dilation[1])

        if pad_h > 0 or pad_w > 0:
            x = F.pad(
                x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2]
            )
        return F.conv2d(
            x,
            self.weight,
            self.bias,
            self.stride,
            self.padding,
            self.dilation,
            self.groups,
        )

class Net20(nn.Module): 
    
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Sequential(
            Conv2dSame(1, 64, kernel_size=(5,3), stride=(3,1), dilation=(2,1)),
            nn.BatchNorm2d(64),
            nn.LeakyReLU(negative_slope=0.01, inplace=True),
            nn.MaxPool2d((2, 1)) 
        )
        self.layer2 = nn.Sequential(
            Conv2dSame(64, 128, kernel_size=(5,3)),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(negative_slope=0.01, inplace=True),
            nn.MaxPool2d((2, 1))
        )
        self.layer3 = nn.Sequential(
            Conv2dSame(128, 256, kernel_size=(5,3)),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(negative_slope=0.01, inplace=True),
            nn.MaxPool2d((2, 1))
        )
        self.fc1 = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(46080, 1), 
        )

    def forward(self, x):
        x = x.reshape(-1,1,64,60)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = x.reshape(-1,46080) # FC neurons according to paper
        x = self.fc1(x)
        return x

Solution

You're right, torch doesn't support 'same' padding and you should implement it yourself for odd size with F.pad.

First I suggest you to inspect MaxPool2D. By default, the output of a 3x3 image by MaxPool2D((2, 2)) is of shape 1x1 (forgetting borders) but you may expect 2x2. Typically if your width starts to be odd you can have a problem. You can try to add ceil_mode=True argument (see this for details).

Now regarding the padding calculation, you should note that there are many possible implementation for padding because "same" padding is ambiguous in your case. I found an other implementation that also should mimic tensorflow behaviour (see here). It seems not to be exactly what you used so you can try it:

class Conv2dSame(nn.Conv2d):

  def calc_same_pad(self, i: int, k: int, s: int, d: int) -> int:
    # (i + s - 1) // s instead of ceil(i / s)
    pad = max(0, ((i + s - 1) // s - 1) * s + (k - 1) * d + 1 - i)
    return pad

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    # ...
    pad_w, pad_h = self.calc_same_pad(...), self.calc_same_pad(...)
    w_odd, h_odd = pad_w % 2 == 1, pad_h % 2 == 1

    if w_odd or h_odd:
      # Add 1 padding now for odd size
      x = F.pad(x, [0, int(w_odd), 0, int(h_odd)])

    return F.conv2d(
      x,
      kernel_size=self.weight,
      bias=self.bias,
      stride=self.stride,
      padding=(pad_w // 2, pad_h // 2),  # add the rest of the padding here
      dilatation=self.dilation,
      groups=self.groups,
      )

Basically if for instance i = s + 1 = 4 the two methods don't return the same thing. I let you explore that.

Notes:

padding argument in Conv2dSame has no effect (that's logical as the padding is calculated on the fly)
the padding in two steps should be similar to what you did with [pad_w // 2, pad_w - pad_w //2, ...] but I keep it to respect the original code (in case of there is a strange behaviour in Conv2D but I don't think so)

Hope it solves the problem.