Search code examples
pythonpytorchgradientbackpropagationautomatic-differentiation

How to get the gradients of network parameters for a derivative-based loss?


I have a network y(x) for which I have a given dataset dy(x). That is, I know the derivative of y for a certain x but I do not know y itself. A minimal example of this is:

import torch

# Define network to predict y(x)
network = torch.nn.Sequential(
    torch.nn.Linear(1, 50),
    torch.nn.Tanh(),
    torch.nn.Linear(50, 1)
)

# Define dataset dy(x) = x, which corresponds to y = 0.5x^2
x = torch.linspace(0,1,100).reshape(-1,1)
dy = x

# Calculate loss based on derivative of prediction for y
x.requires_grad=True
y_pred = network(x)
dy_pred = torch.autograd.grad(y_pred, x, grad_outputs=torch.ones_like(y_pred), create_graph=True)[0]
loss = torch.mean((dy-dy_pred)**2)

# This throws an error
gradients = torch.autograd.grad(loss, network.parameters())[0]

At the last line, it gives the error One of the differentiated Tensors appears to not have been used in the graph, even though the parameters have definitely been used to calculate the loss. Interestingly, when I use torch.optim.Adam on the loss with loss.backward(), no error occurs. How can I fix my error? Note: defining a network to predict dy directly is not an option for my actual problem.


Solution

  • I figured out the issue is that the bias of the output neuron does not contribute to the loss and therefore it is not used in the graph. Hence, we have to eliminate this bias parameter by passing bias=False to the last layer.

    # Define network to predict y(x)
    network = torch.nn.Sequential(
        torch.nn.Linear(1, 50),
        torch.nn.Tanh(),
        torch.nn.Linear(50, 1, bias=False)
    )