Search code examples
machine-learningneural-networkpytorch

Neural net loss exponentially rises after first propogation


I am training a neural network on video frames (converted to greyscale) to output a tensor with two values. The first iteration always evaluates an acceptable loss (mean squared error generally between 15-40), followed by an exponential rise in the second pass, and then infinite.

The net is quite vanilla:

class NeuralNetwork(nn.Module):

    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(100 * 291, 29100),
            nn.ReLU(),
            nn.Linear(29100, 29100),
            nn.ReLU(),
            nn.Linear(29100, 2),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

As is the training loop:

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to("cpu"), y.to("cpu")

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropogation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

Example of loss function growth:

ITERATION 1
prediction: tensor([[-1.2239, -8.2337]], grad_fn=<AddmmBackward>)
actual:     tensor([[0.0321, 0.0325]])
loss:       tensor(34.9545, grad_fn=<MseLossBackward>)


ITERATION 2
prediction: tensor([[ 314636.5625, 2063098.2500]], grad_fn=<AddmmBackward>)
actual:     tensor([[0.0330, 0.0323]])
loss:       tensor(2.1777e+12, grad_fn=<MseLossBackward>)


ITERATION 3
prediction: tensor([[-8.0924e+22, -5.3062e+23]], grad_fn=<AddmmBackward>)
actual:     tensor([[0.0334, 0.0317]])
loss:       tensor(inf, grad_fn=<MseLossBackward>)

Here is an example of the video data: it's a 291x100 greyscale image and there are 1100 of them in the training dataset:

dataset.video_frames.size()
> torch.Size([1100, 100, 291])

dataset.video_frames[0]
> tensor([[21., 29., 28.,  ..., 33., 27., 26.],
        [22., 27., 25.,  ..., 25., 25., 30.],
        [23., 26., 26.,  ..., 24., 24., 28.],
        ...,
        [24., 33., 31.,  ..., 41., 40., 42.],
        [26., 34., 31.,  ..., 26., 20., 22.],
        [25., 32., 32.,  ..., 21., 20., 18.]])

And the labeled training data:

dataset.y.size()
> torch.Size([1100, 2])

dataset.y[0]
> tensor([0.0335, 0.0315], dtype=torch.float)

I've fiddled the learning rate, number of hidden layers, and nothing seems to keep the loss from going to infinite.


Solution

  • Properly scaling the inputs is crucial for proper training. Weights are initialized based on some assumptions on the way inputs are scaled. See this part of a lecture on weight initialization and see how critical it is for proper convergence.

    More details on the mathematical analysis of the influence of weight initialization can be found in Sec. 2 of this paper:
    Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (ICCV 2015).