I am trying to understand how PyTorch backpropagation works, using the following code.
import torch
import numpy
x = torch.tensor(numpy.e, requires_grad=True)
y = torch.log(x)
y.backward()
print(x.grad)
The result is tensor(0.3679)
, as expected, which is 1 / x
, which is the derivative of log(x)
w.r.t. x
with x = numpy.e
. However, if I run the last 3 lines again WITHOUT re-assigning x
, i.e. do
y = torch.log(x)
y.backward()
print(x.grad)
then I will get tensor(0.7358)
, which is twice the previous result.
Why does this happen?
Gradients are accumulated until cleared. From the docs (emphasis mine):
This function accumulates gradients in the leaves - you might need to zero them before calling it.
This zeroing can be done by way of x.grad.zero_()
or, in the case of a torch.optim.Optimizer
, optim.zero_grad()
.