Search code examples
pythonoptimizationpytorchtorch

Torch optimisers with different scaled parameters


I am trying to optimise parameter values using a torch optimiser but the parameters are on vastly different scales. i.e., one parameter has values in the thousands while others are between 0 and 1. For example in this made up case there are two parameters - one has an optimal value of 0.1 and the other an optimal value of 20. How can I modify this code so it applies a sensible learning rate to each parameter say 1e-3 and 0.1?

import torch as pt
# Objective function
def f(x, y):
    return (10 - 100 * x) ** 2 + (y - 20) ** 2 
# Optimal parameters
print("Optimal value:", f(0.1, 20))
# Initial parameters
hp = pt.Tensor([1, 10])
print("Initial value", f(*hp))
# Optimiser
hp.requires_grad = True
optimizer = pt.optim.Adam([hp])
n = 5
for i in range(n):
    optimizer.zero_grad()
    loss = f(*hp)
    loss.backward()
    optimizer.step()
hp.requires_grad = False
print("Final parameters:", hp)
print("Final value:", f(*hp))

Solution

  • torch.optim.Optimizer class accepts a list of dictionaries in the params argument as the parameter groups. In each dictionary, you need to define params and other arguments used for this parameter group. If you do not provide a specific argument in the dictionary, the original arguments passed to the Optimizer will be used instead. Refer to the official documentation for more information.

    Here is the updated code:

    import torch as pt
    
    
    # Objective function
    def f(x, y):
        return (10 - 100 * x) ** 2 + (y - 20) ** 2
    
    
    # Optimal parameters
    print("Optimal value:", f(0.1, 20))
    # Initial parameters
    hp = pt.Tensor([1]), pt.Tensor([10])
    print("Initial value", f(*hp))
    # Optimiser
    for param in hp:
        param.requires_grad = True
    # eps and betas are shared between the two groups
    
    optimizer = pt.optim.Adam([{"params": [hp[0]], "lr": 1e-3}, {"params": [hp[1]], "lr": 0.1}])
    # optimizer = pt.optim.Adam([{"params": [hp[0]], "lr": 1}, {"params": [hp[1]], "lr": 2.2}])
    
    n = 5
    for i in range(n):
        optimizer.zero_grad()
        loss = f(*hp)
        loss.backward()
        optimizer.step()
    for param in hp:
        param.requires_grad = False
    print("Final parameters:", hp)
    print("Final value:", f(*hp))
    
    

    Try using {"lr": 1} and {"lr": 2.2} for the first and second parameters, respectively. It will result in the final value of 19.9713.