Search code examples

PyTorch: a very simple model is not trained

I am watching a tutorial on youtube called PyTorch for Deep Learning & Machine Learning.

I've tried to build a very simple linear regression model based on information on the video. Below is code with the model and training loop. I also provided outputs.

For some reason, the model is not getting trained. I assigned parameters to an optimiser, created a loss function, then backpropagate, and finally update parameters with step(). As you can see from output, the loss has quite odd values. I can't understand why it is not working.

# Imports
import torch
from torch import nn
from torch import optim

# Create model class
class LinearRegressionModel(nn.Module):
    def __init__(self):
        self.linear_layer = nn.Linear(in_features=1, out_features=1)
    # Forward method to define the computation in the model
    def forward(self, X: torch.Tensor) -> torch.Tensor:
        return self.linear_layer(X)

# Set manual seed

# Create data
weight, bias = 0.7, 0.3
start, end, step = 0, 1, 0.02
X = torch.arange(start, end, step).unsqueeze(dim=1)
y = weight*X + bias + 0.035*torch.randn_like(X) 

# Create train/test split
train_split = int(0.8*len(X))
X_train, y_train = X[:train_split], y[:train_split]
X_test,  y_test  = X[train_split:], y[train_split:]

# Create model
model = LinearRegressionModel()
print(model, end="\n\n")
print(model.state_dict(), end="\n\n")

# Create loss function
loss_fn = nn.L1Loss()

# Create optimiser
optimiser = optim.SGD(params=model.parameters(), lr=1e2)

# Training loop 
epochs = 200
for epoch in range(1, epochs+1):
    # Set model to training mode
    # Forward pass
    y_pred = model(X_train)
    # Calculate loss
    loss = loss_fn(y_pred, y_train)
    # Zero gradients in optimiser
    # Backpropate
    # Step model's lparameters
    ### Evaluate the current state
    with torch.inference_mode():
        test_pred = model(X_test)
        test_loss = loss_fn(test_pred, y_test)
    # Print the current state
    if epoch == 1 or epoch % 10 == 0:
        print("Epoch: {:3} | Loss: {:.2f} | Test loss {:.2f}".format(epoch,loss,test_loss))


The output:

  (linear_layer): Linear(in_features=1, out_features=1, bias=True)

OrderedDict([('linear_layer.weight', tensor([[0.8294]])), ('linear_layer.bias', tensor([-0.5927]))])

Epoch:   1 | Loss: 0.85 | Test loss: 133.93
Epoch:  10 | Loss: 114.36 | Test loss: 0.78
Epoch:  20 | Loss: 114.36 | Test loss: 0.78
Epoch:  30 | Loss: 114.36 | Test loss: 0.78
Epoch:  40 | Loss: 114.36 | Test loss: 0.78
Epoch:  50 | Loss: 114.36 | Test loss: 0.78
Epoch:  60 | Loss: 114.36 | Test loss: 0.78
Epoch:  70 | Loss: 114.36 | Test loss: 0.78
Epoch:  80 | Loss: 114.36 | Test loss: 0.78
Epoch:  90 | Loss: 114.36 | Test loss: 0.78
Epoch: 100 | Loss: 114.36 | Test loss: 0.78
Epoch: 110 | Loss: 114.36 | Test loss: 0.78
Epoch: 120 | Loss: 114.36 | Test loss: 0.78
Epoch: 130 | Loss: 114.36 | Test loss: 0.78
Epoch: 140 | Loss: 114.36 | Test loss: 0.78
Epoch: 150 | Loss: 114.36 | Test loss: 0.78
Epoch: 160 | Loss: 114.36 | Test loss: 0.78
Epoch: 170 | Loss: 114.36 | Test loss: 0.78
Epoch: 180 | Loss: 114.36 | Test loss: 0.78
Epoch: 190 | Loss: 114.36 | Test loss: 0.78
Epoch: 200 | Loss: 114.36 | Test loss: 0.78

OrderedDict([('linear_layer.weight', tensor([[0.8294]])), ('linear_layer.bias', tensor([-0.5927]))])


  • I think this works well; you just need to make the changes in:

    1. Learning rate: Your learning rate is very high, so replace its value with some smaller value; here, I've used lr=1e-2.
      optimiser = optim.SGD(params=model.parameters(), lr=1e-2)

    I got the following output:

      (linear_layer): Linear(in_features=1, out_features=1, bias=True)
    OrderedDict([('linear_layer.weight', tensor([[0.8294]])), ('linear_layer.bias', tensor([-0.5927]))])
    Epoch:   1 | Loss: 0.85 | Test loss 0.77
    Epoch:  10 | Loss: 0.74 | Test loss 0.64
    Epoch:  20 | Loss: 0.63 | Test loss 0.51
    Epoch:  30 | Loss: 0.51 | Test loss 0.37
    Epoch:  40 | Loss: 0.40 | Test loss 0.24
    Epoch:  50 | Loss: 0.28 | Test loss 0.11
    Epoch:  60 | Loss: 0.17 | Test loss 0.04
    Epoch:  70 | Loss: 0.10 | Test loss 0.12
    Epoch:  80 | Loss: 0.09 | Test loss 0.16
    Epoch:  90 | Loss: 0.08 | Test loss 0.18
    Epoch: 100 | Loss: 0.07 | Test loss 0.19
    Epoch: 110 | Loss: 0.07 | Test loss 0.18
    Epoch: 120 | Loss: 0.07 | Test loss 0.18
    Epoch: 130 | Loss: 0.06 | Test loss 0.17
    Epoch: 140 | Loss: 0.06 | Test loss 0.16
    Epoch: 150 | Loss: 0.06 | Test loss 0.15
    Epoch: 160 | Loss: 0.06 | Test loss 0.14
    Epoch: 170 | Loss: 0.05 | Test loss 0.14
    Epoch: 180 | Loss: 0.05 | Test loss 0.13
    Epoch: 190 | Loss: 0.05 | Test loss 0.12
    Epoch: 200 | Loss: 0.05 | Test loss 0.12
    OrderedDict([('linear_layer.weight', tensor([[0.9241]])), ('linear_layer.bias', tensor([0.2178]))])

    A high learning rate usually skips the optimal value. Therefore, it didn't work earlier.

    I hope this helps you. Thanks!