I am trying to implement a simple gradient descent for linear regression with pytorch as shown in this example in the docs:
import torch
from torch.autograd import Variable
learning_rate = 0.01
y = 5
x = torch.tensor([3., 0., 1.])
w = torch.tensor([2., 3., 9.], requires_grad=True)
b = torch.tensor(1., requires_grad=True)
for z in range(100):
y_pred = b + torch.sum(w * x)
loss = (y_pred - y).pow(2)
loss = Variable(loss, requires_grad = True)
# loss.requires_grad = True
loss.backward()
with torch.no_grad():
w = w - learning_rate * w.grad
b = b - learning_rate * b.grad
w.grad = None
b.grad = None
When I run the code I get the error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I have read here and here that it could be solved
using
loss = Variable(loss, requires_grad = True)
results in TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'
or loss.requires_grad = True
results in RuntimeError: you can only change requires_grad flags of leaf variables.
How can I fix this?
This error was actually caused by mixing calculation functions from torch
with python built ins (same should go with numpy or other libraries which are not torch). It actually means that the autograd implementation from torch is breaking because they don't work with other functions.
a good explanation can be read here
This is more like a not fully appropiate hack:
calling .retain_grad()
before backward solved the issue for me:
learning_rate = 0.01
y = 5
x = torch.tensor([3., 0., 1.])
w = torch.tensor([2., 3., 9.], requires_grad=True)
b = torch.tensor(1., requires_grad=True)
for z in range(100):
y_pred = b + torch.sum(w * x)
loss = (y_pred - y).pow(2)
w.retain_grad()
b.retain_grad()
loss.backward()
w = w - learning_rate * w.grad
b = b - learning_rate * b.grad