python pytorch autograd computation-graph

Why the grad is unavailable for the tensor in gpu

a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()
loss.backward()
print(a.grad)
print(b.grad)

After executing codes, the a.grad is None although a.requires_grad is True. But if the code a = a.cuda() is removed, a.grad is available after the loss backward.

Solution

The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.

a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()

a.retain_grad() # added this line

loss.backward()
print(a.grad)

That happens because of your line a = a.cuda() that override the original value of a.

You could use

a = torch.nn.Parameter(torch.ones(5, 5))
a.cuda()

a = torch.nn.Parameter(torch.ones(5, 5, device='cuda'))

a = torch.nn.Parameter(torch.ones(5, 5).cuda())

Or explicitly requesting to retain the gradients of a

a.retain_grad() # added this line

Erasing the gradients of intermediate variables can save significant amount of memory. So it is good that you retain gradients only where you need.