Search code examples
pythonpytorchloss-function

Calculating gradient from network output in PyTorch gives error


I am trying to use a manually calculate a gradient using the output of my network, I will then use this in a loss function. I have managed to get an example working in keras, but converting it to PyTorch has proven more difficult

I have a model like:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(1, 50)  
        self.fc2 = nn.Linear(50, 10)
        self.fc3 = nn.Linear(10, 1)

    def forward(self, x):
        x = F.sigmoid(self.fc1(x))
        x = F.sigmoid(self.fc2(x))
        x = self.fc3(x)
        return x

and some data:

x = torch.unsqueeze(torch.linspace(-1, 1, 101), dim=1)
x = Variable(x)

I can then try find a gradient like:

output = net(x)
grad = torch.autograd.grad(outputs=output, inputs=x, retain_graph=True)[0]

I want to be able to find the gradient of each point, then do something like:

err_sqr = (grad - x)**2
loss = torch.mean(err_sqr)**2

However, at the moment if I try to do this I get the error:

grad can be implicitly created only for scalar outputs

I have tried changing the shape of my network output to fix this, but if I change it to much it says its not part of the graph. I can get rid of that error by allowing that, but then it says my gradient is None. I've managed to get this working in keras, so I'm confident that its possible here too, I just need a hand!

My questions are:

  • Is there a way to "fix" what I have to allow me to calculate the gradient

Solution

  • PyTorch expects an upstream gradient in the grad call. For usual (scalar) loss functions, the upstream gradient is implicitly assumed to be 1.

    You can do a similar thing by passing ones as the upstream gradient:

    grad = torch.autograd.grad(outputs=output, inputs=x, grad_outputs=torch.ones_like(output), retain_graph=True)[0]