Search code examples
pythonpytorchautograd

Taking the derivative of the zero function in pytorch giving Runtime error


I have the following simple code:


def f(x):
    return x[:,0] + x[:,1]

def g(x):
    return torch.zeros_like(x[:,0])

def main():    
    x = torch.tensor([[0.3, 0.3],
        [0.6, 0.3],
        [0.3, 0.6],
        [0.6, 0.6]])
    x.requires_grad_()
    grad_myf = autograd.grad(outputs=f(x), inputs=x, grad_outputs=torch.ones_like(f(x)), create_graph=True, retain_graph=True, only_inputs=True)[0]
    print(grad_myf)

This outputs the right thing:

tensor([[1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.]])

Now I want to take derivative of the g function. The g function is just supposed to return 0 whatever the x value. So its derivative should be zero. So I write

    grad_myg = autograd.grad(outputs=g(x), inputs=x, grad_outputs=torch.ones_like(g(x)), create_graph=True, retain_graph=True, only_inputs=True)[0]
    print(grad_myg)

and I get the error message "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn".

Why doesn't it work? Do I need to redefine g in a different way? Something like

def g(x):
    return 0*x

does work, but I don't know if this is the best way. The way I defined g is the natural way.


Solution

  • You get RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn because tensor torch.zeros_like(x[:,0]) by default has requires_grad=False. If you change it to the True, you get another error: RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. And this is because the result of g is a completely new tensor. This tensor is not in the built graph (it's like an isolated node) and torch cannot calculate gradient for it. 0 * x works because torch can find derivative of it.