Search code examples
pythonmachine-learninglinear-regressiongradient-descent

Cant understand the number of iteration in linear regression (Machine Learning)


I've been trying to get my head around machine learning over few days. I am watching Videos and articles all over the internet.
In this video. Siraj(guy on the video) taught to create gradient descent from scratch with numpy and python. Here is the code:-

import numpy as np

ERROR = []

def compute_error_for_given_points(b, m, points):
    total_error = 0.0

    for i in range(0,len(points)):
        x = points[i,0]
        y = points[i,1]
        total_error += (y-(m * x + b))**2

    return total_error/float(len(points))

def step_gradient(b_current, m_current, points, learning_rate):
    b_gradient = 0
    m_gradient = 0
    N = float(len(points))

    for i in range(0,int(N)):
        x = points[i, 0]
        y = points[i, 1]
        b_gradient += -(2/N) * (y- (m_current*x + b_current))
        m_gradient += -(2/N) * x * (y- (m_current*x + b_current))
    new_b = b_current - (learning_rate * b_gradient)
    new_m = m_current - (learning_rate * m_gradient)

    return new_b,new_m

def gradient_descent_runner(points,starting_b,starting_m,learning_rate,num_iteration):
    b = starting_b
    m = starting_m

    for i in range(num_iteration):
        ERROR.append(compute_error_for_given_points(b,m,points))
        b, m  = step_gradient(b, m, points, learning_rate)

    return b,m


def run():
    points = np.genfromtxt('data.csv',delimiter=',')
    #print(type(points))
    #print(points)
    #hyperparameter
    learning_rate = 0.0001

    #y = mx + b
    initial_b = 0
    initial_m = 0
    num_iteration = 100000

    b,m = gradient_descent_runner(points,initial_b,initial_m,learning_rate,num_iteration)
    print("OPTIMIZED: ",b)
    print("OPTIMIZED: ",m)
    print("Error: ",compute_error_for_given_points(b,m,points))
    print("\n")
    #print(ERROR)


if __name__ == '__main__':
    run()

I understand all the math and calculus stuff. But i cant get the idea of the variable num_iteration. He told to use 1000, got some values for b and m. But when im using higher than 1000, im getting differnt values of b and m. And the loop where num_iteration is used, cant we replace it with a while loop, condition would be until and unless we are getting the lowest value for cost function? So, if you people can please give me some insight in this num_iteration variable that would be very helpful.
Thanks in advance


Solution

  • For putting a condition on the while loop saying 'go till the lowest value of the cost function', you first have to know what that value would be. You won't know what that would be until the program finishes converging. There is no way you can know it beforehand.

    So instead you make it run for an arbitrary number, in your case, 1000, and hope that by that many iterations, the least value of the cost(or at least a sensible value) would reach.

    EDIT

    This is what I am getting after running the code for 10000

    OPTIMIZED:  4.247984440219189
    OPTIMIZED:  1.3959992655297515
    Error:  110.7863192974508
    

    For 1000 -

    OPTIMIZED:  0.08893651993741346
    OPTIMIZED:  1.4777440851894448
    Error:  112.61481011613473
    

    When the code is run 1000 times, the error is more compared to what we get with 100000 times. This is as expected. The more you run the loop, the more parameters get closer to the optimum. With 100000 epochs, the code will keep adjusting b and m until an error or 110 is reached. With 1000, it will keep doing the same thing but it stops once it reaches 1000.