Gradients not populating as expected

Sorry, I know questions of this sort have been asked a lot, but I still don't understand the behavior of autograd.

A simple example is below:

ce_loss=torch.nn.BCELoss()
par=torch.randn((1,n),requires_grad=True)
act=torch.nn.Sigmoid()

y_hat=[]
for obs in data:
    y_hat.append(act(par@obs))

loss=ce_loss(torch.tensor(y_hat,requires_grad=True),y)
loss.backward()

After applying backward, the grad of par remains None (although it is a leaf node with requires_grad=True).

Any tips?

Solution

It is simply because torch.tensor(...) create a new leaf of the computational graph. It means by definition that the operation inside torch.tensor are blocked, in particular the computation using elements of par (and so, the grads are never computed). Note that adding requires_grad=True doesn't change anything because it always creates a leaf (with grads) that forgot the previous operations by definition of a leaf.

I suggest you an other way to make your computation without iterate on data and using native parallelization:

batch_size, n = 8, 10  # or something else

# Random data and labels to reproduce the code
data = torch.randn((batch_size, n))
y = torch.randn((batch_size, ))

y = y.unsqueeze(1)  # size (batch_size, 1)
ce_loss = torch.nn.BCELoss()
par = torch.randn((1, n), requires_grad=True)
act = torch.nn.Sigmoid()

y_hat = act(data @ par.T)  # compute all predictions in parallel

loss = ce_loss(y_hat, y)  # automatically reduced to scalar (mean)
loss.backward()

print(par.grad)  # no longer None!