Search code examples
neural-networkchainer

How to implement separate learning rate or optimizer in different layer in Chainer?


In my structure of NN, I wanna use different learning rate or optimizer , e.g. AdaGrad, in each layer. How to implement it? Wait for your help. Thks.


Solution

  • After you setup optimizer to the model, each parameter of link in the model has update_rule attribute (e.g. AdaGradRule in this case), which defines how to update this parameter.

    And each update_rule has hyperparam attribute separately, so you can overwrite these hyperparam for each parameter in the link.

    Below is a sample code,

    class MLP(chainer.Chain):
    
        def __init__(self, n_units, n_out):
            super(MLP, self).__init__()
            with self.init_scope():
                # input size of each layer will be inferred when omitted
                self.l1 = L.Linear(n_units)  # n_in -> n_units
                self.l2 = L.Linear(n_units)  # n_units -> n_units
                self.l3 = L.Linear(n_out)  # n_units -> n_out
    
        def __call__(self, x):
            h1 = F.relu(self.l1(x))
            h2 = F.relu(self.l2(h1))
            return self.l3(h2)
    
    model = MLP(args.unit, 10)
    classifier_model = L.Classifier(model)
    if args.gpu >= 0:
        chainer.cuda.get_device_from_id(args.gpu).use()  # Make a specified GPU current
        classifier_model.to_gpu()  # Copy the model to the GPU
    
    # Setup an optimizer
    optimizer = chainer.optimizers.AdaGrad()
    optimizer.setup(classifier_model)
    
    # --- After `optimizer.setup()`, you can modify `hyperparam` of each parameter ---
    
    # 1. Change `update_rule` for specific parameter
    #    `l1` is `Linear` link, which has parameter `W` and `b`
    classifier_model.predictor.l1.W.update_rule.hyperparam.lr = 0.01
    
    # 2. Change `update_rule` for all parameters (W & b) of one link
    for param in classifier_model.predictor.l2.params():
        param.update_rule.hyperparam.lr = 0.01
    
    # --- You can setup trainer module to train the model in the following...
    ...