image-processing pytorch conv-neural-network

Changing 3D Convolutional Encoder layers to 3D Deconvolutional layers

I have a 3D convolutional network that is composed of an encoder and a decoder. In the decoder part, I want to use the type of convolutional layers that are used in the decoder part but making them to do deconvolution. In other words, I want that the mentioned convolutional layers increase the spatial size as like as the converse rate that is used in the encoder while decrease the number of channels. The convolutional layer that I use inside my encoder contains sequential layers:

Conv_layer = nn.Sequential(
            BasicConv3d(64, 64, kernel_size=1, stride=1),
            SepConv3d(64, 192, kernel_size=3, stride=1, padding=1),
            nn.MaxPool3d(kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1)),
        )

The type of convolutional layers that are used:

    class BasicConv3d(nn.Module):
        def __init__(self, in_planes, out_planes, kernel_size, stride, padding=0):
        super(BasicConv3d, self).__init__()
        self.conv = nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, bias=False)
        self.bn = nn.BatchNorm3d(out_planes, eps=1e-3, momentum=0.001, affine=True)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x
    
 class SepConv3d(nn.Module):
        def __init__(self, in_planes, out_planes, kernel_size, stride, padding=0):
        super(SepConv3d, self).__init__()
        self.conv_s = nn.Conv3d(in_planes, out_planes, kernel_size=(1,kernel_size,kernel_size), stride=(1,stride,stride), padding=(0,padding,padding), bias=False)
        self.bn_s = nn.BatchNorm3d(out_planes, eps=1e-3, momentum=0.001, affine=True)
        self.relu_s = nn.ReLU()

        self.conv_t = nn.Conv3d(out_planes, out_planes, kernel_size=(kernel_size,1,1), stride=(stride,1,1), padding=(padding,0,0), bias=False)
        self.bn_t = nn.BatchNorm3d(out_planes, eps=1e-3, momentum=0.001, affine=True)
        self.relu_t = nn.ReLU()

    def forward(self, x):
        x = self.conv_s(x)
        x = self.bn_s(x)
        x = self.relu_s(x)

        x = self.conv_t(x)
        x = self.bn_t(x)
        x = self.relu_t(x)
        return x

My question is that how I should chnge the kernel_size, stride and padding that change the mentioned layers to deconvolution that incease the spatial size of feature maps as the same with the converse rate of convolutional layers.

Solution

Basically, you want a nn.ConvTranspose3d with kernel size (3, 3, 3) and stride (1, 2, 2). You can see the formula relating input size to output size here:

D_out=(D_in−1)×stride − 2×padding + dilation×(kernel_size−1) + output_padding + 1

In your case, kernel_size=3, dilation=1 and stride is 1 for the temporal dimension and 2 for the spatial dimensions.

Thus the desired layers would be something like:

out_planes = 192
in_planes = 64
deconv_layer = nn.Sequential(
  nn.ConvTranspose3d(out_planes, out_planes, kernel_size=(3, 1, 1), stride=1, padding=0, output_padding=0)
  nn.BatchNorm3d(out_planes),
  nn.ReLU(inplace=True),
  nn.ConvTranspose3d(out_planes, in_planes, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), output_padding=(0, 1, 1))
  nn.BatchNorm3d(in_planes),
  nn.ReLU(inplace=True),
  BasicConv3d(in_planes, in_planes, kernel_size=1, stride=1),  # to be consistent with the Conv_layers structure
)