Search code examples
pythonmachine-learningregressionpytorchgradient-descent

PyTorch Linear Regression Issue


I am trying to implement a simple linear model in PyTorch that can be given x data and y data, and then trained to recognize the equation y = mx + b. However, whenever I try to test my model after training, it thinks that the equation is y= mx + 2b. I'll show my code, and hopefully someone will be able to spot an issue. Thank you in advance for any help.

import torch

D_in = 500
D_out = 500
batch=200
model=torch.nn.Sequential(
     torch.nn.Linear(D_in,D_out),
)

Next I create some data and set a rule. Let's do 3x+4.

x_data=torch.rand(batch,D_in)
y_data=torch.randn(batch,D_out)

for i in range(batch):
    for j in range(D_in):
         y_data[i][j]=3*x_data[i][j]+5 # model thinks y=mx+c -> y=mx+2c?

loss_fn=torch.nn.MSELoss(size_average=False)
optimizer=torch.optim.Adam(model.parameters(),lr=0.001)

Now to training...

for epoch in range(500):
    y_pred=model(x_data)
    loss=loss_fn(y_pred,y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Then I test my model with a Tensor/matrix of just 1's.

test_data=torch.ones(batch,D_in) 
y_pred=model(test_data)

Now, I'd expect to get 3*1 + 4 = 7, but instead, my model thinks it is 11.

[[ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    ...,
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516]])

Similarly, if I change the rule to y=3x+8, my model guesses 19. So, I am not sure what is going on. Why is the constant being added twice? By the way, if I just set the rule to y=3x, my model correctly infers 3, and for y=mx in general my model correctly infers m. For some reason, the constant term is throwing it off. Any help to solve this problem is much appreciated. Thanks!


Solution

  • Your network does not learn long enough. It gets a vector with 500 features to describe a single datum.

    Your network has to map the big input of 500 features to an output including 500 values. Your trainingdata is randomly created, not like your simple example, so I think you just have to train longer to fit your weights to approximate this function from R^500 to R^500.

    If I reduce the input and output dimensionality and increase the batch size, learning rate and training steps I get the expected result:

    import torch
    
    D_in = 100
    D_out = 100
    batch = 512
    
    model=torch.nn.Sequential(
         torch.nn.Linear(D_in,D_out),
    )
    
    x_data=torch.rand(batch,D_in)
    y_data=torch.randn(batch,D_out)
    for i in range(batch):
        for j in range(D_in):
             y_data[i][j]=3*x_data[i][j]+4 # model thinks y=mx+c -> y=mx+2c?
    
    loss_fn=torch.nn.MSELoss(size_average=False)
    optimizer=torch.optim.Adam(model.parameters(),lr=0.01)
    
    for epoch in range(10000):
        y_pred=model(x_data)
        loss=loss_fn(y_pred,y_data)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    test_data=torch.ones(batch,D_in)
    y_pred=model(test_data)
    print(y_pred)
    

    If you just want to approximate f(x) = 3x + 4 with only one input you could also set D_in and D_out to 1.