Search code examples
pythonmachine-learningpytorchtorch

Why does a torch gradient increase linearly every time the function is backpropagated?


I am trying to understand how PyTorch backpropagation works, using the following code.

import torch
import numpy
x = torch.tensor(numpy.e, requires_grad=True)
y = torch.log(x)
y.backward()
print(x.grad)

The result is tensor(0.3679), as expected, which is 1 / x, which is the derivative of log(x) w.r.t. x with x = numpy.e. However, if I run the last 3 lines again WITHOUT re-assigning x, i.e. do

y = torch.log(x)
y.backward()
print(x.grad)

then I will get tensor(0.7358), which is twice the previous result. Why does this happen?


Solution

  • Gradients are accumulated until cleared. From the docs (emphasis mine):

    This function accumulates gradients in the leaves - you might need to zero them before calling it.

    This zeroing can be done by way of x.grad.zero_() or, in the case of a torch.optim.Optimizer, optim.zero_grad().