Consider the following two contexts
with torch.no_grad():
params = params - learning_rate * params.grad
and
with torch.no_grad():
params -= learning_rate * params.grad
In the second case .backward()
is running smoothly and in the first case it is giving the
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
What is the reason for this as it is normal to use x-= a
and x = x - a
interchangeably?
Note that x -= a
and x = x - a
cannot be used interchangeably: The latter creates a new tensor that is assigned to the variable x
, while the former performes an in place operation.
Therefore with
with torch.no_grad():
params -= learning_rate * params.grad
everything works fine in your optimization loop, while in
with torch.no_grad():
params = params - learning_rate * params.grad
the variable params
gets overwritten with a new tensor. Since this new tensor was created within a torch.no_grad()
context, this means that this new tensor has params.requires_grad=False
and therefore does not have a .grad
attribute. Therefore in the next iteration torch will complain that params.grad
does not exist.