Search code examples
error-handlingpytorchlinear-regressiongradient-descent

`RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn` for linear regression with gradient descent using torch


I am trying to implement a simple gradient descent for linear regression with pytorch as shown in this example in the docs:

import torch
from torch.autograd import Variable

learning_rate = 0.01
y = 5
x = torch.tensor([3., 0., 1.])
w = torch.tensor([2., 3., 9.], requires_grad=True)
b = torch.tensor(1., requires_grad=True)

for z in range(100):
    y_pred = b + torch.sum(w * x)
    loss = (y_pred - y).pow(2)
    loss = Variable(loss, requires_grad = True)
    # loss.requires_grad = True
    loss.backward()
    
    with torch.no_grad():
        w = w - learning_rate * w.grad
        b = b - learning_rate * b.grad
        
        w.grad = None
        b.grad = None

When I run the code I get the error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I have read here and here that it could be solved

using

  • loss = Variable(loss, requires_grad = True) results in TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

  • or loss.requires_grad = True results in RuntimeError: you can only change requires_grad flags of leaf variables.

How can I fix this?


Solution

  • This error was actually caused by mixing calculation functions from torch with python built ins (same should go with numpy or other libraries which are not torch). It actually means that the autograd implementation from torch is breaking because they don't work with other functions.

    a good explanation can be read here


    This is more like a not fully appropiate hack:

    calling .retain_grad() before backward solved the issue for me:

    learning_rate = 0.01
    y = 5
    x = torch.tensor([3., 0., 1.])
    w = torch.tensor([2., 3., 9.], requires_grad=True)
    b = torch.tensor(1., requires_grad=True)
    
    for z in range(100):
        y_pred = b + torch.sum(w * x)
        loss = (y_pred - y).pow(2)
        
        w.retain_grad()
        b.retain_grad()
        loss.backward()
        
        w = w - learning_rate * w.grad
        b = b - learning_rate * b.grad