I am trying to use Adam optimizer to obtain certain values outside of a neural network. My technique wasn't working so I created a simple example to see if it works:
a = np.array([[0.0,1.0,2.0,3.0,4.0], [0.0,1.0,2.0,3.0,4.0]])
b = np.array([[0.1,0.2,0.0,0.0,0.0], [0.0,0.5,0.0,0.0,0.0]])
a = torch.from_numpy(a)
b = torch.from_numpy(b)
a.requires_grad = True
b.requires_grad = True
optimizer = torch.optim.Adam(
[b],
lr=0.01,
weight_decay=0.001
)
iterations = 200
for i in range(iterations ):
loss = torch.sqrt(((a.detach() - b.detach()) ** 2).sum(1)).mean()
loss.requires_grad = True
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i % 10 == 0:
print(b)
print("loss:", loss)
My intuition was b should get close to a as much as possible to reduce loss. But I see no change in any of the values of b and loss stays exactly the same. What am I missing here? Thanks.
You are detaching b
, meaning the gradient won't flow all the way to b
when backpropagating, i.e. b
won't change! Additionally, you don't need to state requires_grad = True
on the loss
, as this is done automatically since one of the operands has the requires_grad
flag on.
loss = torch.sqrt(((a.detach() - b) ** 2).sum(1)).mean()