deep-learning pytorch gradient backpropagation autograd

How can I get the sum of gradients immediately after loss.backward()?

I am new to Pytorch, and I am trying to do some importance sampling experiments: During an evaluation epoch, I calculate the loss for each training sample, and obtain the sum of gradients for this training sample. Finally, I will sort the training samples based on gradients they introduced. For example, if sample A shows a very high gradient sum, it must be an important sample to training. Otherwise, it is not a very important sample.

Note that, the gradients calculated here will not be used to update parameters. In other words, they are only used for selecting importance samples.

I know gradients will be ready somewhere after loss.backward(). But what is the easiest way to grab the summed gradients over the entire model? In my current implementation, I am only allowed to modify one small module with only loss availble, so I don’t have “inputs” or “model”. Is it possible to get the gradients from only “loss”?

Solution

Gradients after backward are stored as the grad attribute of tensors that require grad. You can find all tensors involved and sum up their grads. A cleaner way might be to write a backward hook to accumulate gradients to some global variable while backpropagating.

An example is

import torch
import torch.nn as nn

model = nn.Linear(5, 3)
print(model.weight.grad)  # None, since the grads have not been computed yet
print(model.bias.grad)

x = torch.randn(5, 5)
y = model(x)
loss = y.sum()
loss.backward()

print(model.weight.grad)
print(model.bias.grad)

output:

None
None
tensor([[-0.6164,  1.1585, -3.4117, -4.3192, -3.7273],
        [-0.6164,  1.1585, -3.4117, -4.3192, -3.7273],
        [-0.6164,  1.1585, -3.4117, -4.3192, -3.7273]])
tensor([5., 5., 5.])

As you see, you can access the gradients as param.grad. If model is an torch.nn.Module object, you can iterate over its parameters with for param in model.parameters().

Maybe you can also work with backward hooks but I am not that familiar with them to give a code example.