Search code examples
deep-learningneural-networkpytorch

How to add an additional output node during training for Pytorch?


I am making a class-incremental learning multi-label classifier. Here the model first trains with 7 labels. After training, another dataset emerges that contains the same labels except one more. I want to automatically add an extra node to the trained network and continue training on this new dataset. How can I do this?

class FeedForewardNN(nn.Module):
    def __init__(self, input_size, h1_size = 264, h2_size = 128, num_services=8):
        super().__init__()
        self.input_size = input_size
        self.lin1 = nn.Linear(input_size, h1_size)
        self.lin2 = nn.Linear(h1_size, h2_size)
        self.lin3 = nn.Linear(h2_size, num_services)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.lin1(x)
        x = self.relu(x)
        x = self.lin2(x)
        x = self.relu(x)
        x = self.lin3(x)
        x = self.sigmoid(x)
        return x

This is the architecture of the feedforward Neural Network. Then I first train on the data set with only 7 classes.

#Create NN
input_size = len(x_columns)
net1 = FeedForewardNN(input_size, num_services=7)
alpha= 0.001

#Define optimizer
optimizer = optim.Adam(net.parameters(), lr=alpha)
criterion = nn.BCELoss()
running_loss = 0

#Training Loop
loss_list = []
auc_list = []

for i in range(len(train_data_x)):
    optimizer.zero_grad()

    outputs = net1(train_data_x[i])
    loss = criterion(outputs, train_data_y[i])
    loss.backward()
    optimizer.step()

However then, I want to add one additional output node, define the new weights but maintain the old trained weights, and train on this new data set.


Solution

  • I suggest to replace layer with new one, having desired shape, and than partially assign its parameter values with old ones as follows:

    def increaseClassifier( m: torch.nn.Linear ):
        w = m.weight
        b = m.bias
        old_shape = m.weight.shape
    
        m2 = nn.Linear( old_shape[1], old_shape[0] +1 )
        m2.weight = nn.parameter.Parameter( torch.cat( (m.weight, m2.weight[0:1]) ) )
        m2.bias = nn.parameter.Parameter( torch.cat( (m.bias, m2.bias[0:1]) ) )
        return m2
    
    class FeedForewardNN(nn.Module):
        ...
        def incrHere(self):
            self.lin3 = increaseClassifier( self.lin3 )
    

    UPD:

    Can you explain, how these additional weights that come with this new output node are initialized?

    The initial weights for new channel come from new layer creation, layer constructor make new parameters with some random initialization, then we are replace part of it with trained weight, and remained part is ready for new training.

    m2.weight = nn.parameter.Parameter( torch.cat( (m.weight, m2.weight[0:1]) ) )