Search code examples
pytorch

If an instance of a nn.module inheriting object is called by 2 different sequential layers, are weights shared between them?


Apologies if the terminology in the title is strange or incorrect, I am trying to refer to the following scenario:

As a minimal example, I define a network as follows:

class Convolution_Layers(nn.Module):
  def __init__(self, in, out, kernel):
    super(Convolution_Layers, self).__init__()
  
    self.conv2d = nn.Conv2d(in_channels=in, out_channels=out, kernel_size=kernel)

    self.conv2d_layers = nn.Sequential(
      self.conv2d,
      nn.ReLU,
    )

  forward(self,x):
    return self.conv2d_layers(x)
    

class Network_Model(nn.Module):
  def __init__(self):
    super(Network_Model, self).__init__()
    self.basic_conv = Convolution_layers(1,1,3)

    self.subnetwk_1 = nn.ModuleList().append([self.basic_conv])
    self.subnetwk_2 = nn.ModuleList().append([self.basic_conv])

  def forward(self,x1,x2):
    out1, out2 = x1, x2
    for l in self.subnetwk_1:
      out1 = l(x1)
    for l in self.subnetwk_2:
      out2 = l(x2)
    return out1,out2

I would like to know if this would result in the weights in subnetwork 1 and 2 being shared, since they come from the same instance of Convolution layers.

Ideally I would like to have the weights be separate, but be able to create the basic convolution block only once, and then re-use it elsewhere. There may be a better way of accomplishing this.


Solution

  • You have a typo in your code, the instance shouldn't be called when appended to the module lists. To answer your question, yes both sub-networks will share the same weights since you appended a unique instance and not two.

    shared_conv = Convolution_layers(1,1,3)
    self.subnetwk_1 = nn.ModuleList([shared_conv])
    self.subnetwk_2 = nn.ModuleList([shared_conv])
    

    What does "be able to create the basic convolution block only once" mean? If you are looking to have the two sub-networks share the same architecture but with separate weights, then you need to initialize two layers:

    self.subnetwk_1 = nn.ModuleList([Convolution_layers(1,1,3)])
    self.subnetwk_2 = nn.ModuleList([Convolution_layers(1,1,3)])
    

    If you want separate sub-networks but share their arguments, you can use keyword arguments and pass a unique dictionary multiple times to the init function:

    params = dict(in=1, out=1, kernel=3)
    self.subnetwk_1 = nn.ModuleList([Convolution_layers(*params)])
    self.subnetwk_2 = nn.ModuleList([Convolution_layers(*params)])
    

    Or depending on the complexity of your initialization, make use of a helper function, maybe that's what you meant by "self.basic_conv()" in your code snippet:

    class Network_Model(nn.Module):
      def __init__(self):
        super(Network_Model, self).__init__()
        self.subnetwk_1 = nn.ModuleList([self.basic_conv()])
        self.subnetwk_2 = nn.ModuleList([self.basic_conv()])
    
      def basic_conv(self):
          return Convolution_layers(1, 1, 3)