Search code examples
pythonpytorchautogradcomputation-graph

Why the grad is unavailable for the tensor in gpu


a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()
loss.backward()
print(a.grad)
print(b.grad)

After executing codes, the a.grad is None although a.requires_grad is True. But if the code a = a.cuda() is removed, a.grad is available after the loss backward.


Solution

  • The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.

    a = torch.nn.Parameter(torch.ones(5, 5))
    a = a.cuda()
    print(a.requires_grad)
    b = a
    b = b - 2
    print('a ', a)
    print('b ', b)
    loss = (b - 1).pow(2).sum()
    
    a.retain_grad() # added this line
    
    loss.backward()
    print(a.grad)
    

    That happens because of your line a = a.cuda() that override the original value of a.

    You could use

    a = torch.nn.Parameter(torch.ones(5, 5))
    a.cuda()
    

    Or

    a = torch.nn.Parameter(torch.ones(5, 5, device='cuda'))
    
    a = torch.nn.Parameter(torch.ones(5, 5).cuda())
    

    Or explicitly requesting to retain the gradients of a

    a.retain_grad() # added this line
    

    Erasing the gradients of intermediate variables can save significant amount of memory. So it is good that you retain gradients only where you need.