Search code examples
pythonpytorchtorchgradient-descentautograd

Understanding gradient computation using backward() in PyTorch


I'm trying to understand the basic pytorch autograd system:

x = torch.tensor(10., requires_grad=True)
print('tensor:',x)
x.backward()
print('gradient:',x.grad)

output:

tensor: tensor(10., requires_grad=True)
gradient: tensor(1.)

since x is a scalar constant and no function is applied to it, I expected 0. as the gradient output. Why is the gradient 1. instead?


Solution

  • Whenever you are using value.backward(), you compute the derivative value (in your case value == x) with respect to all your parameters (in your case that is just x). Roughly speaking, this means all tensors that are somehow involved in your computation that have requires_grad=True. So this means

    x.grad = dx / dx = 1
    

    To add to that: With the automatic differentiation you always ever compute with "constant" values: All your function or networks are always evaluated at a concrete point. And the gradient you get is the gradient evaluated at that same point. There is no symbolic computation taking place. All the information needed for the computation of the gradient is encoded in the computation graph.