I am using torch.autograd.grad(). But it keeps throwing runtime error saying that differentiated tensor is out of the computational graph

I am writing code for a PINN model. While calculating the gradients for the loss PDE, I used torch.autograd.grad(). But it is showing

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

for the line

dphidx = torch.autograd.grad(train_output[:, 0], X_train_tensor[:,0], torch.ones_like(train_output[:, 0]), create_graph=True)[0]

I checked both train_output[:, 0] and X_train_tensor[:,0] are true for gradient_requres(True). Now I am confused about what is wrong here.

I am attaching the model's code for clarity:

import torch.nn as nn

class PINNFP(nn.Module):
    def __init__(self):
        super().__init__()
        self.manual_layers = nn.Sequential(
            nn.Linear(in_features = 3, out_features = 5),
            nn.Linear(in_features = 5, out_features = 5),
            nn.Linear(in_features = 5, out_features = 5),
            nn.Linear(in_features = 5, out_features = 5),
            nn.Linear(in_features = 5, out_features = 2))

    def forward(self, x):
        return self.manual_layers(x)

model_1 = PINNFP()

train_output = model_1(X_train_tensor)
dphidx = torch.autograd.grad(train_output[:, 0], X_train_tensor[:,0], torch.ones_like(train_output[:, 0]), create_graph=True)[0]

How can I fix this error? I used allow_unused=None, and in that case, I get none value as the gradient which I do not want.

Solution

You have to change X_train_tensor[:,0] to X_train_tensor

import torch.nn as nn

class PINNFP(nn.Module):
    def __init__(self):
        super().__init__()
        self.manual_layers = nn.Sequential(
            nn.Linear(in_features = 3, out_features = 5),
            nn.Linear(in_features = 5, out_features = 5),
            nn.Linear(in_features = 5, out_features = 5),
            nn.Linear(in_features = 5, out_features = 5),
            nn.Linear(in_features = 5, out_features = 2))

    def forward(self, x):
        return self.manual_layers(x)

model_1 = PINNFP()

X_train_tensor = torch.randn(8, 3, requires_grad=True)

train_output = model_1(X_train_tensor)
dphidx = torch.autograd.grad(train_output[:,0], X_train_tensor, torch.ones_like(train_output[:,0]))[0]

Slicing creates a new tensor with a new computational graph. So the tensor X_train_tensor[:,0] is actually part of a new computational graph stemming directly from X_train_tensor. This means there's no path from train_output[:,0] to X_train_tensor[:,0]. What you can do instead is backprop from train_output[:,0] to X_train_tensor and take a slice of the gradient.