Search code examples
neural-networkdeep-learningpytorchbackpropagation

How to do backprop in Pytorch (autograd.backward(loss) vs loss.backward()) and where to set requires_grad=True?


I have been using Pytorch for a while now. One question I had regarding backprop is as follows:

let's say we have a loss function for a neural network. For doing backprop, I have seen two different versions. One like:

optimizer.zero_grad()
autograd.backward(loss)
optimizer.step()

and the other one like:

optimizer.zero_grad()
loss.backward()
optimizer.step()

Which one should I use? Is there any difference between these two versions?

As a last question, do we need to specify the requires_grad=True for the parameters of every layer of our network to make sure their gradients is being computed in the backprop?

For example do I need to specify it for the layer nn.Linear(hidden_size, output_size) inside my network or it is automatically being set to True by default?


Solution

  • so just a quick answer: both autograd.backward(loss) and loss.backward() are actually the same. Just look at the implementation of tensor.backward() (as your loss is just a tensor), where tensor.loss just calls autograd.backward(loss).

    As to your second question: whenever you use a prefabricated layer such as nn.Linear, or convolutions, or RNNs, etc., all of them rely on nn.Parameter attributes to store the parameters values. And, as the docs say, these default with requires_grad=True.

    Update to a follow-up in the comments: To answer what happens to tensors when they are in a backward pass depends on whether a variable is on the computation path between the "output" and a leaf variable, or not. If not, it is not entirely clear what backprop should compute - after all, the entire purpose is to compute gradients for parameters, i.e., leaf-variables. If the tensor is on that path, all gradients will generally be automatically computed. For a more thorough discussion, see this question and this tutorial from the docs.