I have been using Pytorch for a while now. One question I had regarding backprop is as follows:
let's say we have a loss function for a neural network. For doing backprop, I have seen two different versions. One like:
optimizer.zero_grad()
autograd.backward(loss)
optimizer.step()
and the other one like:
optimizer.zero_grad()
loss.backward()
optimizer.step()
Which one should I use? Is there any difference between these two versions?
As a last question, do we need to specify the requires_grad=True
for the parameters of every layer of our network to make sure their gradients is being computed in the backprop?
For example do I need to specify it for the layer nn.Linear(hidden_size, output_size)
inside my network or it is automatically being set to True by default?
so just a quick answer: both autograd.backward(loss)
and loss.backward()
are actually the same. Just look at the implementation of tensor.backward()
(as your loss is just a tensor), where tensor.loss
just calls autograd.backward(loss)
.
As to your second question: whenever you use a prefabricated layer such as nn.Linear
, or convolutions, or RNNs, etc., all of them rely on nn.Parameter
attributes to store the parameters values. And, as the docs say, these default with requires_grad=True
.
Update to a follow-up in the comments: To answer what happens to tensors when they are in a backward pass depends on whether a variable is on the computation path between the "output" and a leaf variable, or not. If not, it is not entirely clear what backprop should compute - after all, the entire purpose is to compute gradients for parameters, i.e., leaf-variables. If the tensor is on that path, all gradients will generally be automatically computed. For a more thorough discussion, see this question and this tutorial from the docs.