Search code examples
pythonmachine-learningdeep-learningpytorch

How do I change the output size of torch.nn.Linear while preserving existing parameters?


I am programming a PyTorch model which includes the following:

self.basis_mat=torch.nn.Linear(
in_features=self.basis_mat_params[0],
out_features=self.basis_mat_params[1],
bias=self.basis_mat_params[2]).to(device)

Now what I want to do is to dynamically increment the out_features of self.basis_mat. However, I also want to preserve any previously trained parameters. In other words, we randomly initialize any new parameters while leaving the rest unchanged, while out_features is being incremented in the model.

So, what should I do?

I didn't really know what to try... I checked the documentation, but it turns out that in torch.nn.Linear the parameter out_features is unchangeable and if I instantiate a new torch.nn.Linear (which would be quite slow I guess, so even if something similar works I would only use this as a last resort) the parameters are inbuilt, random, and unadjustable.


Solution

  • You can do the following,

    from torch import nn
    import torch 
    dim1 = 3
    dim2 = 4 
    dim3 = 5 
    older_weight_parameters = torch.zeros( dim1, dim2) # replace this with older parameters, older_linear_layer.weight.t().detach()
    older_bias_parameters = torch.zeros( dim2) # replace this with older parameters, older_linear_layer.bias.detach()
    
    linear_layer = nn.Linear(dim1, dim2 + dim3)
    # This will fill the corresponding parameters with the older parameter values
    linear_layer.weight.data[:dim2, :] = older_weight_parameters.t()
    linear_layer.bias.data[:dim2] = older_bias_parameters 
    
        
    

    Then the linear layer will have the following weight and bias parameter values. You can see the values of older parameters (in the examples these are zero values) have replaced the random parameters in the respective places.

    linear_layer.weight, linear_layer.bias
    (Parameter containing:
     tensor([[ 0.0000,  0.0000,  0.0000],
             [ 0.0000,  0.0000,  0.0000],
             [ 0.0000,  0.0000,  0.0000],
             [ 0.0000,  0.0000,  0.0000],
             [ 0.3455, -0.3385,  0.4746],
             [-0.2067, -0.4950, -0.3668],
             [-0.4012, -0.3951, -0.3772],
             [-0.1393,  0.3602, -0.0460],
             [-0.4641,  0.2152,  0.4031]], requires_grad=True),
     Parameter containing:
     tensor([ 0.0000,  0.0000,  0.0000,  0.0000,  0.0502, -0.3423,  0.3871, -0.5218,
             -0.4322], requires_grad=True)
    

    The weights and bias parameters of the Linear layer are like any other tensor. You can use tensor manipulation on it.