I'm trying to implement a multivariable linear regression with gradient descent but when I try this:
# Starting values
w = np.ones(3) # The number of features is 3
b = float(0)
def gradient_descent():
global w
global b
learning_rate = 0.0001
for i in range(x_train.shape[0]):
prediction = np.dot(x_train[i], w) + b
error = x_train[i] - prediction
for j in range(w.shape[0]):
w[j] = w[j] - (error * x_train[i][j] * learning_rate)
b = b - (error * learning_rate)
def train():
for i in range(10_000):
gradient_descent()
print(i, ':', w, b)
train()
the output is
0 : [inf inf inf] inf
1 : [inf inf inf] inf
2 : [inf inf inf] inf
3 : [inf inf inf] inf
4 : [inf inf inf] inf
5 : [inf inf inf] inf
6 : [inf inf inf] inf
....
so what I did wrong? I tried to decrease the learning rate but nothing changed
data sample:
total_rooms,population,households,bedrooms(target)
5612.0,1015.0,472.0,1283.0
7650.0,1129.0,463.0,1901.0
720.0,333.0,117.0,174.0
1501.0,515.0,226.0,337.0
1454.0,624.0,262.0,326.0
which total_rooms, population and households is x_train with shape (17000, 3) and bedrooms is y_train with shape (17000, 1)
when I try to scale the data using sklearn.preprocessing.StandardScaler
before splitting the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
train_data = scaler.fit_transform(train_data)
x_train = train_data[:, :3]
y_train = train_data[:, -1]
I get nan
instead of inf
!
note: The data works fine with scaling or not with sklearn.linear_model.LinearRegression
As suggested in the comments: feature scaling is a good idea (scikit-learn includes SimpleScaler
, but it's pretty straightforward to subtract the mean of each column and divide by the standard deviation as well).
Also: the error term appears to be backwards, the residual is usually prediction - true
.
error = prediction - y[i]