I'm trying to compute the gradient of my loss function with respect to my model parameters in PyTorch.
That is, let u(x; θ)
be the model, where x
is the input (in R^n
) and θ
are the model parameters. I'm trying to compute du/dθ
.
For a "simple" loss function, this is not a problem, but my loss function depends on the gradient of the model with respect to its inputs (i.e., du/dx
). When I attempt to do this, I'm met with the following error message: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Here is a minimal example to illustrate the issue:
import torch
import torch.nn as nn
from torch.autograd import grad
model = nn.Sequential(nn.Linear(1, 10), nn.Tanh(), nn.Linear(10, 1))
def loss1(x, u):
return torch.mean(u)
def loss2(x, u):
d_u_x = grad(u, x, torch.ones_like(u), retain_graph=True, create_graph=True)[0]
return torch.mean(d_u_x)
x = torch.randn(10, 1)
x.requires_grad_()
u = model(x)
loss = loss2(x, u)
d_loss_params = grad(loss, model.parameters(), retain_graph=True)
If I change the second to last line to read loss = loss1(x, u)
things work as expected.
Update: it appears to be working if I set bias=False
for the nn.Linear
s. OK, that makes some sense since the bias is not trainable. But that begs the question, how do I extract only the trainable parameters to use in the gradient computation?
This was solved by passing allow_unused=True
and materialize_grads=True
to grad
. That is:
d_loss_params = grad(loss, model.parameters(), retain_graph=True, allow_unused=True, materialize_grads=True)
See discussion on https://discuss.pytorch.org/t/gradient-of-loss-that-depends-on-gradient-of-network-with-respect-to-parameters/217275 for more info.