python neural-network pytorch loss-function

The loss value does not decrease

I am implementing a simple feedforward neural network with Pytorch and the loss function does not seem to decrease. Because of some other tests I have done, the problem seems to be in the computations I do to compute pred, since if I slightly change the network so that it spits out a 2-dimensional vector for each entry and save it as pred, everything works perfectly.

Do you see the problem in defining pred here? Thanks

import torch
import numpy as np
from torch import nn

dt = 0.1

class Neural_Network(nn.Module):
    def __init__(self, ):
        super(Neural_Network, self).__init__()
    
        self.l1 = nn.Linear(2,300)
        self.nl = nn.Tanh()
        self.l2 = nn.Linear(300,1)
    
    
    def forward(self, X):
        z = self.l1(X)
        z = self.nl(z)
        o = self.l2(z)
        return o



N = 1000
X = torch.rand(N,2,requires_grad=True)
y = torch.rand(N,1)
NN = Neural_Network()

criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(NN.parameters(), lr=1e-5)

epochs = 200

for i in range(epochs):  # trains the NN 1,000 times

    HH = torch.mean(NN(X))
    gradH =  torch.autograd.grad(HH, X)[0]
    XH= torch.cat((gradH[:,1].unsqueeze(0),-gradH[:,0].unsqueeze(0)),dim=0).t()
    pred = X + dt*XH

    #Optimize and improve the weights
    loss = criterion(pred, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


    print (" Loss: ", loss.detach().numpy())  # mean sum squared loss

P.S. With these X and y the loss is not expected to go to zero, I have added them here like them just for simplicity. I will apply this architecture to data points which are expected to satisfy this model. However I am just interested in seeing the loss decreasing.

My aim is to approximate with a neural network the Hamiltonian of a vector field where only some trajectory is known. For example only the updates x(t)\rightarrow x(t+\Delta t) for some choice of points. So the vector X contains the points x(t), while y contains the $x(t+\Delta t)$. My network above approximates in a simple way the Hamiltonian function H(x) and in order to optimize it I need to find the trajectories associated to this Hamiltonian.

In particular XH aims to be the Hamiltonian vector field associated to the approximated Hamiltonian. The time update pred = X + dt*XH is simply one step of forward Euler.

However, my main issue here can be abstracted in: how can I involve the gradient of a network with respect to its inputs in the loss function?

Solution

Probably because the gradient flow graph for NN is destroyed with the gradH step. (check HH.grad_fn vs gradH.grad_fn )

So your pred tensor (and subsequent loss) does not contain the necessary gradient flow through the NN network.

The loss contains gradient flow for the input X, but not for the NN.parameters(). Because the optimizer only take a step() over thoseNN.parameters(), the network NN is not being updated, and since X is neither being updated, loss does not change.

You can check how the loss is sending it's gradients backward by checking loss.grad_fn after loss.backward() and here's a neat function (found on Stackoverflow) to check it:

def getBack(var_grad_fn):
    print(var_grad_fn)
    for n in var_grad_fn.next_functions:
        if n[0]:
            try:
                tensor = getattr(n[0], 'variable')
                print(n[0])
                print('Tensor with grad found:', tensor)
                print(' - gradient:', tensor.grad)
                print()
            except AttributeError as e:
                getBack(n[0])

with getBack(loss.grad_fn) after loss.backward() to check it for yourself (maybe reduce size of batch N before though)

Edit: It works by changing gradH = torch.autograd.grad(HH, X, create_graph=True)[0]