Search code examples
pythonneural-networkpytorchbackpropagation

How to train the Shared Layers in PyTorch


I have the follow code

import torch
import torch.nn as nn
from torchviz import make_dot, make_dot_from_trace

class Net(nn.Module):
    def __init__(self, input, output):
        super(Net, self).__init__()
        self.fc = nn.Linear(input, output)

    def forward(self, x):
        x = self.fc(x)
        x = self.fc(x)
        return x

model = Net(12, 12)
print(model)

x = torch.rand(1, 12)
y = model(x)
make_dot(y, params = dict(model.named_parameters()))

Here I reuse the self.fc twice in the forward.

The computational graph is look

enter image description here

I am confused about the computational graph and, I am curious how to train this model in back propagation? It seem for me the gradient will live in a loop forever. Thanks a lot.


Solution

  • There are no issues with your graph. You can train it the same way as any other feed-forward model.

    1. Regarding looping: Since it is a directed acyclic graph, the are no actual loops (check out the arrow directions).
    2. Regarding backprop: Let’s consider fc.bias parameter. Since you are reusing the same layer two times, the bias has two outgoing arrows (used in two places of your net). During backpropagation stage the direction is reversed: bias will get gradients from two places, and these gradients will add up.
    3. Regarding the graph: An FC layer can be represented as this: Addmm(bias, x, T(weight), where T is transposing and Addmm is matrix multiplication plus adding a vector. So, you can see how data (weight, bias) is passed into functions (Addmm, T)