I have the following simple code:
def f(x):
return x[:,0] + x[:,1]
def g(x):
return torch.zeros_like(x[:,0])
def main():
x = torch.tensor([[0.3, 0.3],
[0.6, 0.3],
[0.3, 0.6],
[0.6, 0.6]])
x.requires_grad_()
grad_myf = autograd.grad(outputs=f(x), inputs=x, grad_outputs=torch.ones_like(f(x)), create_graph=True, retain_graph=True, only_inputs=True)[0]
print(grad_myf)
This outputs the right thing:
tensor([[1., 1.],
[1., 1.],
[1., 1.],
[1., 1.]])
Now I want to take derivative of the g function. The g function is just supposed to return 0 whatever the x value. So its derivative should be zero. So I write
grad_myg = autograd.grad(outputs=g(x), inputs=x, grad_outputs=torch.ones_like(g(x)), create_graph=True, retain_graph=True, only_inputs=True)[0]
print(grad_myg)
and I get the error message "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn".
Why doesn't it work? Do I need to redefine g in a different way? Something like
def g(x):
return 0*x
does work, but I don't know if this is the best way. The way I defined g is the natural way.
You get RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
because tensor torch.zeros_like(x[:,0])
by default has requires_grad=False
. If you change it to the True
, you get another error: RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
And this is because the result of g
is a completely new tensor. This tensor is not in the built graph (it's like an isolated node) and torch cannot calculate gradient for it. 0 * x
works because torch can find derivative of it.