Search code examples
pythonnumpymachine-learninglinear-regressiongradient-descent

Getting infinity values from gradient descent


I'm trying to implement a multivariable linear regression with gradient descent but when I try this:

# Starting values
w = np.ones(3) # The number of features is 3
b = float(0)

def gradient_descent():
  global w
  global b

  learning_rate = 0.0001

  for i in range(x_train.shape[0]):
    prediction = np.dot(x_train[i], w) + b
    error = x_train[i] - prediction
    for j in range(w.shape[0]):
      w[j] = w[j] - (error * x_train[i][j] * learning_rate)
    b = b - (error * learning_rate)

def train():
  for i in range(10_000):
    gradient_descent()
    print(i, ':', w, b)

train()

the output is

0 : [inf inf inf] inf
1 : [inf inf inf] inf
2 : [inf inf inf] inf
3 : [inf inf inf] inf
4 : [inf inf inf] inf
5 : [inf inf inf] inf
6 : [inf inf inf] inf
....

so what I did wrong? I tried to decrease the learning rate but nothing changed

data sample:

total_rooms,population,households,bedrooms(target)
5612.0,1015.0,472.0,1283.0
7650.0,1129.0,463.0,1901.0
720.0,333.0,117.0,174.0
1501.0,515.0,226.0,337.0
1454.0,624.0,262.0,326.0

which total_rooms, population and households is x_train with shape (17000, 3) and bedrooms is y_train with shape (17000, 1)

when I try to scale the data using sklearn.preprocessing.StandardScaler before splitting the data

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
train_data = scaler.fit_transform(train_data)
x_train = train_data[:, :3]
y_train = train_data[:, -1]

I get nan instead of inf!

note: The data works fine with scaling or not with sklearn.linear_model.LinearRegression


Solution

  • As suggested in the comments: feature scaling is a good idea (scikit-learn includes SimpleScaler, but it's pretty straightforward to subtract the mean of each column and divide by the standard deviation as well).

    Also: the error term appears to be backwards, the residual is usually prediction - true.

    error = prediction - y[i]