python machine-learning linear-regression gradient-descent

Gradient Descent for multivariate Regression value not converging

I have tried this piece of code for multi variable regression for finding the coefficients but couldn't find where I am making mistake or if I am on the right path? The problem is the mse value not getting converged.

here x1 , x2 , x3 are the 3 feature variables i am having(i have sliced each feature column into these x1 , x2 ,x3 vars)

def gradientDescent(x,y):
   mCurrent1=mCurrent2=mCurrent3=bCurrent=0
   iteration=1000
   learningRate=0.0000001
   n=len(x)


   for i in range(0,iteration):
       y_predict=mCurrent1*x1+mCurrent2*x2+mCurrent3*x3+bCurrent
       mse=(1/n)*np.sum([val**2 for val in (y-y_predict)])


       mPartDerivative1=-(2/n)*np.sum(x1*(y-y_predict))
       mPartDerivative2=-(2/n)*np.sum(x2*(y-y_predict))
       mPartDerivative3=-(2/n)*np.sum(x3*(y-y_predict))

       bPartDerivative=-(2/n)*np.sum(y-y_predict)

       mCurrent1=mCurrent1-(learningRate*mPartDerivative1)
       mCurrent2=mCurrent2-(learningRate*mPartDerivative2)
       mCurrent3=mCurrent3-(learningRate*mPartDerivative3)

       bCurrent=bCurrent-(learningRate*bPartDerivative)
       print('m1:{} m2:{} m3:{} b:{} iter:{} mse:{}'.format(mCurrent1,mCurrent2,mCurrent3,bCurrent,i,mse))

    return(round(mCurrent1,3),round(mCurrent2,3),round(mCurrent3,3),round(bCurrent,3))

Solution

It looks like you program should work. However, it's likely your learning rate is too small. Remember that the learning rate is the size of the step you are taking down your cost function. If a learning rate is too small, it will move down the cost curve too slowly, and it will take a long time to reach convergence (requiring a large iteration number). However, if learning rate is too large, then you have a problem of divergence. Picking the correct learning rate and number of iterations (in other words, tuning your hyperparameters) is more of an art than a science. You should play around with different learning rates.

I created my own dataset and randomly generated data (where (m1, m2, m3, b) = (10, 5, 4, 2)) and ran your code:

import pandas as pd
import numpy as np

x1 = np.random.rand(100,1)
x2 = np.random.rand(100,1)
x3 = np.random.rand(100,1)
y = 2 + 10 * x1 + 5 * x2 + 4 * x3 + 2 * np.random.randn(100,1)
df = pd.DataFrame(np.c_[y,x1,x2,x3],columns=['y','x1','x2','x3'])

#df.head()
#            y        x1        x2        x3
# 0  11.970573  0.785165  0.012989  0.634274
# 1  19.980349  0.919672  0.971063  0.752341
# 2   2.884538  0.170164  0.991058  0.003270
# 3   8.437686  0.474261  0.326746  0.653011
# 4  14.026173  0.509091  0.921010  0.375524

Running your algorithm with a learning rate of 0.0000001 yields the following results:

(m1, m2, m3, b) = (0.001, 0.001, 0.001, 0.002)

Running your algorithm with a learning rate of .1 yields the following results:

(m1, m2, m3, b) = (9.382, 4.841, 4.117, 2.485)

Notice that when the learning rate is 0.0000001, your coefficients are not too different from where they started (0). Like I said earlier, the small learning rate is making it so we change the coefficients at too small of a rate since we are moving down the cost function at super small step sizes.

I have added a picture to help visualize picking a step size. Notice that the first picture uses a small learning rate, and the second uses a larger learning rate.

Small learning rate:

Large learning rate: