python machine-learning regression pytorch gradient-descent

PyTorch Linear Regression Issue

I am trying to implement a simple linear model in PyTorch that can be given x data and y data, and then trained to recognize the equation y = mx + b. However, whenever I try to test my model after training, it thinks that the equation is y= mx + 2b. I'll show my code, and hopefully someone will be able to spot an issue. Thank you in advance for any help.

import torch

D_in = 500
D_out = 500
batch=200
model=torch.nn.Sequential(
     torch.nn.Linear(D_in,D_out),
)

Next I create some data and set a rule. Let's do 3x+4.

x_data=torch.rand(batch,D_in)
y_data=torch.randn(batch,D_out)

for i in range(batch):
    for j in range(D_in):
         y_data[i][j]=3*x_data[i][j]+5 # model thinks y=mx+c -> y=mx+2c?

loss_fn=torch.nn.MSELoss(size_average=False)
optimizer=torch.optim.Adam(model.parameters(),lr=0.001)

Now to training...

for epoch in range(500):
    y_pred=model(x_data)
    loss=loss_fn(y_pred,y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Then I test my model with a Tensor/matrix of just 1's.

test_data=torch.ones(batch,D_in) 
y_pred=model(test_data)

Now, I'd expect to get 3*1 + 4 = 7, but instead, my model thinks it is 11.

[[ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    ...,
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516]])

Similarly, if I change the rule to y=3x+8, my model guesses 19. So, I am not sure what is going on. Why is the constant being added twice? By the way, if I just set the rule to y=3x, my model correctly infers 3, and for y=mx in general my model correctly infers m. For some reason, the constant term is throwing it off. Any help to solve this problem is much appreciated. Thanks!

Solution

Your network does not learn long enough. It gets a vector with 500 features to describe a single datum.

Your network has to map the big input of 500 features to an output including 500 values. Your trainingdata is randomly created, not like your simple example, so I think you just have to train longer to fit your weights to approximate this function from R^500 to R^500.

If I reduce the input and output dimensionality and increase the batch size, learning rate and training steps I get the expected result:

import torch

D_in = 100
D_out = 100
batch = 512

model=torch.nn.Sequential(
     torch.nn.Linear(D_in,D_out),
)

x_data=torch.rand(batch,D_in)
y_data=torch.randn(batch,D_out)
for i in range(batch):
    for j in range(D_in):
         y_data[i][j]=3*x_data[i][j]+4 # model thinks y=mx+c -> y=mx+2c?

loss_fn=torch.nn.MSELoss(size_average=False)
optimizer=torch.optim.Adam(model.parameters(),lr=0.01)

for epoch in range(10000):
    y_pred=model(x_data)
    loss=loss_fn(y_pred,y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

test_data=torch.ones(batch,D_in)
y_pred=model(test_data)
print(y_pred)

If you just want to approximate f(x) = 3x + 4 with only one input you could also set D_in and D_out to 1.