Search code examples
pythondeep-learningpytorch

In PyTorch, is it possible to freeze a module by a coefficient?


In my experiment, I want to first train a low-level model (L), then reuse it in a higher-level task (H). Usually I would come to freezing the model L when trainig H. But is it possible to not completely freeze L, but freeze it by a coefficient, sort of.

I'm sorry if this sounds mathematically incorrect, but if we assume that in the case of a non-frozen model it is affected by gradient at scale of 1.0, and when it's frozen it is affected by 0.0, I would love to be able to vary this coefficient, so I could have not a completely frozen module (0.0), but still be partially affected by the gradient descent (fro example, by 0.1). But it is still important, that model L fully affects the result of H. Or in other words, it affects the result by the scale of 1.0, but at the stage of back-propagation, it is affected by 0.1.

The main idea behind this is for model L to get slightly tuned w.r.t. to a high-level task.

I googled the question, but the best I came up with these two questions, which, I believe, could contain a hint to my question, but I still can't figure out how to have separate "weights" for forward and backward passes:

  1. https://discuss.pytorch.org/t/multiply-a-model-by-trainable-scalar/76308
  2. https://discuss.pytorch.org/t/different-forward-and-backward-weights/52800/10 This one seems to answer the question, but it seems too hacky, and maybe outdated. Maybe there are more established and actual methods of doing this?

Solution

  • From what I understand, you're trying to specify a different learning rate for different parts of your model. Pytorch optimizers support that option directly:

    optim.SGD([
                  {'params': model.base.parameters()},
                  {'params': model.L.parameters(), 'lr': 1e-3}
                ], lr=1e-2, momentum=0.9)
    

    From there, you can run a training loop as usual.