a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()
loss.backward()
print(a.grad)
print(b.grad)
After executing codes, the a.grad
is None
although a.requires_grad
is True
.
But if the code a = a.cuda()
is removed, a.grad
is available after the loss backward.
The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.
a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()
a.retain_grad() # added this line
loss.backward()
print(a.grad)
That happens because of your line a = a.cuda()
that override the original value of a
.
You could use
a = torch.nn.Parameter(torch.ones(5, 5))
a.cuda()
Or
a = torch.nn.Parameter(torch.ones(5, 5, device='cuda'))
a = torch.nn.Parameter(torch.ones(5, 5).cuda())
Or explicitly requesting to retain the gradients of a
a.retain_grad() # added this line
Erasing the gradients of intermediate variables can save significant amount of memory. So it is good that you retain gradients only where you need.