Search code examples
pythonmachine-learningdeep-learningpytorchrecurrent-neural-network

Why my RNN does not converge to a simple task?


I want to create a recursice model to solve the most simple sequence that I know, Arithmetic progression. With having a as the base and d as the step size, the sequence would be as follows:

a, a+d, a+2d, a+3d, a+4d, ...

To solve this, denoting hidden state as h, the model has to learn a simple 2*2 matrix. This is actually setting h1 = t0.

enter image description here

To put it in other words, you can see it like this too:

enter image description here

So this model with a 2*2 fully connected layer should be able to learn this matrix:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(2, 2, bias=False)

    def forward(self, x):
        x = self.fc1(x)
        return x

But to my surprise is does not converge! There should be something wrong with my setup. If you help me find it I will appreciate it. I suspect the problem should be in my training loop.

P.S. I intentionally set batch size to 1 right now. I want to work with padding the input data later. The model should learn without batches anyway.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import numpy as np

class CustomDataset(Dataset):
    def __init__(self, size):
        self.size = size

    def __len__(self):
        return self.size

    def __getitem__(self, index):
        a0 = (np.random.rand() - 0.5) * 200
        d = (np.random.rand() - 0.5) * 40
        length = np.random.randint(2, MAX_Length_sequence + 1)

        sequence = np.arange(length) * d + a0
        next_number = sequence[-1] + d

        return length, torch.tensor(sequence, dtype=torch.float32), torch.tensor(next_number, dtype=torch.float32)

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(2, 2, bias=False)

    def forward(self, x):
        x = self.fc1(x)
        return x

# Hyperparameters
EPOCHS = 10
BATCH_SIZE = 1
LEARNING_RATE = 0.001
DATASET_SIZE = 10000
criterion = nn.MSELoss()

# Model
model = Model()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

My traning loop:

for epoch in range(EPOCHS):
    dataset = CustomDataset(DATASET_SIZE)
    dataloader = DataLoader(dataset, batch_size=BATCH_SIZE)
    model.train()
    total_loss = 0

    for length, sequence, next_number in dataloader:
        optimizer.zero_grad()
        loss = 0
        h = torch.zeros(BATCH_SIZE)

        for i in range(length):
            x = torch.cat([h, sequence[0, i].unsqueeze(0)])
            y = sequence[0, i + 1] if i != length - 1 else next_number[0]

            output = model(x)
            h, y_hat = output[0].unsqueeze(0), output[1]

            loss += criterion(y_hat, y)

        loss.backward()
        optimizer.step()
        total_loss += loss.item() 
        
    print(f'Epoch {epoch+1}, Loss: {total_loss/len(dataloader)}')

Solution

  • I solved it just by taking the loss from only the last output rather than taking all the losses and sum them up. It fixed my issue but I still don't understand why my first approach doesn't work!

    for epoch in range(EPOCHS):
        dataset = CustomDataset(10000)
        dataloader = DataLoader(dataset, batch_size=BATCH_SIZE)
        model.train()
        total_loss = 0
    
        for length, sequence, next_number in dataloader:
            optimizer.zero_grad()
            h = torch.zeros(BATCH_SIZE)
    
            for i in range(length):
                x = torch.cat([h, sequence[0, i].unsqueeze(0)])
                h = model(x)[0].unsqueeze(0)
                
                if i == length - 1: loss = criterion(model(x)[1], next_number[0])
                
            loss.backward()
            optimizer.step()
            total_loss += loss.item() 
            
        print(f'Epoch {epoch+1}, Loss: {total_loss/len(dataloader)}')