I have a 3D convolutional network that is composed of an encoder and a decoder. In the decoder part, I want to use the type of convolutional layers that are used in the decoder part but making them to do deconvolution. In other words, I want that the mentioned convolutional layers increase the spatial size as like as the converse rate that is used in the encoder while decrease the number of channels. The convolutional layer that I use inside my encoder contains sequential layers:
Conv_layer = nn.Sequential(
BasicConv3d(64, 64, kernel_size=1, stride=1),
SepConv3d(64, 192, kernel_size=3, stride=1, padding=1),
nn.MaxPool3d(kernel_size=(1,3,3), stride=(1,2,2), padding=(0,1,1)),
)
The type of convolutional layers that are used:
class BasicConv3d(nn.Module):
def __init__(self, in_planes, out_planes, kernel_size, stride, padding=0):
super(BasicConv3d, self).__init__()
self.conv = nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, bias=False)
self.bn = nn.BatchNorm3d(out_planes, eps=1e-3, momentum=0.001, affine=True)
self.relu = nn.ReLU()
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.relu(x)
return x
class SepConv3d(nn.Module):
def __init__(self, in_planes, out_planes, kernel_size, stride, padding=0):
super(SepConv3d, self).__init__()
self.conv_s = nn.Conv3d(in_planes, out_planes, kernel_size=(1,kernel_size,kernel_size), stride=(1,stride,stride), padding=(0,padding,padding), bias=False)
self.bn_s = nn.BatchNorm3d(out_planes, eps=1e-3, momentum=0.001, affine=True)
self.relu_s = nn.ReLU()
self.conv_t = nn.Conv3d(out_planes, out_planes, kernel_size=(kernel_size,1,1), stride=(stride,1,1), padding=(padding,0,0), bias=False)
self.bn_t = nn.BatchNorm3d(out_planes, eps=1e-3, momentum=0.001, affine=True)
self.relu_t = nn.ReLU()
def forward(self, x):
x = self.conv_s(x)
x = self.bn_s(x)
x = self.relu_s(x)
x = self.conv_t(x)
x = self.bn_t(x)
x = self.relu_t(x)
return x
My question is that how I should chnge the kernel_size
, stride
and padding
that change the mentioned layers to deconvolution that incease the spatial size of feature maps as the same with the converse rate of convolutional layers.
Basically, you want a nn.ConvTranspose3d
with kernel size (3, 3, 3)
and stride (1, 2, 2)
. You can see the formula relating input size to output size here:
D_out=(D_in−1)×stride − 2×padding + dilation×(kernel_size−1) + output_padding + 1
In your case, kernel_size=3
, dilation=1
and stride
is 1 for the temporal dimension and 2 for the spatial dimensions.
Thus the desired layers would be something like:
out_planes = 192
in_planes = 64
deconv_layer = nn.Sequential(
nn.ConvTranspose3d(out_planes, out_planes, kernel_size=(3, 1, 1), stride=1, padding=0, output_padding=0)
nn.BatchNorm3d(out_planes),
nn.ReLU(inplace=True),
nn.ConvTranspose3d(out_planes, in_planes, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1), output_padding=(0, 1, 1))
nn.BatchNorm3d(in_planes),
nn.ReLU(inplace=True),
BasicConv3d(in_planes, in_planes, kernel_size=1, stride=1), # to be consistent with the Conv_layers structure
)