I am trying to optimise parameter values using a torch optimiser but the parameters are on vastly different scales. i.e., one parameter has values in the thousands while others are between 0 and 1. For example in this made up case there are two parameters - one has an optimal value of 0.1 and the other an optimal value of 20. How can I modify this code so it applies a sensible learning rate to each parameter say 1e-3 and 0.1?
import torch as pt
# Objective function
def f(x, y):
return (10 - 100 * x) ** 2 + (y - 20) ** 2
# Optimal parameters
print("Optimal value:", f(0.1, 20))
# Initial parameters
hp = pt.Tensor([1, 10])
print("Initial value", f(*hp))
# Optimiser
hp.requires_grad = True
optimizer = pt.optim.Adam([hp])
n = 5
for i in range(n):
optimizer.zero_grad()
loss = f(*hp)
loss.backward()
optimizer.step()
hp.requires_grad = False
print("Final parameters:", hp)
print("Final value:", f(*hp))
torch.optim.Optimizer
class accepts a list of dictionaries in the params
argument as the parameter groups. In each dictionary, you need to define params
and other arguments used for this parameter group. If you do not provide a specific argument in the dictionary, the original arguments passed to the Optimizer
will be used instead. Refer to the official documentation for more information.
Here is the updated code:
import torch as pt
# Objective function
def f(x, y):
return (10 - 100 * x) ** 2 + (y - 20) ** 2
# Optimal parameters
print("Optimal value:", f(0.1, 20))
# Initial parameters
hp = pt.Tensor([1]), pt.Tensor([10])
print("Initial value", f(*hp))
# Optimiser
for param in hp:
param.requires_grad = True
# eps and betas are shared between the two groups
optimizer = pt.optim.Adam([{"params": [hp[0]], "lr": 1e-3}, {"params": [hp[1]], "lr": 0.1}])
# optimizer = pt.optim.Adam([{"params": [hp[0]], "lr": 1}, {"params": [hp[1]], "lr": 2.2}])
n = 5
for i in range(n):
optimizer.zero_grad()
loss = f(*hp)
loss.backward()
optimizer.step()
for param in hp:
param.requires_grad = False
print("Final parameters:", hp)
print("Final value:", f(*hp))
Try using {"lr": 1}
and {"lr": 2.2}
for the first and second parameters, respectively. It will result in the final value of 19.9713.