The title is quite self-explanatory. I have the following
import torch
x = torch.tensor([3., 4.], requires_grad=True)
A = torch.tensor([[x[0], x[1]],
[x[1], x[0]]], requires_grad=True)
f = torch.norm(A)
I would like to compute the gradient of f with respect to x, but if I type x.grad
I just get None
. If I use the more explicit command torch.autograd.grad(f, x)
instead of f.backward()
, I get
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
The problem might be, that when you take a slice of a leaf tensor, it returns a non-leaf tensor like so:
>>> x.is_leaf
>>> x[0].is_leaf
So what's happening is that x is not what was added to the graph, but instead x[0].
Try this instead:
>>> import torch
>>> x = torch.tensor([3., 4.], requires_grad=True)
>>> xt = torch.empty_like(x).copy_(x.flip(0))
>>> A = torch.stack([x,xt])
>>> f = torch.norm(A)
>>> f.backward()
>>> x.grad
tensor([0.8485, 1.1314])
The difference is that PyTorch knows to add x to the graph, so f.backward() populates it's gradient. Here you'll find a few different way of copying tensors and the effect it has on the graph.