Search code examples
pythonmachine-learninglinear-regressiongradient-descent

Cost Function Increases, Then Stops Growing


I understand the zig-zag nature of the cost function when applying gradient descent, but what bothers me is that the cost started out at a low 300 only to increase to 1600 in the end.

The cost function would oscillate between 300 and 4000 to end up at 1600. I thought I should get a number that is 300 or lower. I have tried changing the learning rate and all it does is still take me to 1600. I should get a cost around 300, not one that grows it.

Data:

square_feet = [1661.0, 871.0, 1108.0, 1453.0, 1506.0, 1100.0, 1266.0, 1514.0, 948.0, 1878.0, 1522.0, 931.0, 1475.0, 1177.0, 1844.0, 1469.0, 2155.0, 967.0, 1092.0]
prices = [1350.0, 489.0, 589.0, 539.0, 775.0, 575.0, 749.0, 795.0, 644.9, 590.0, 575.0, 699.0, 999.0, 775.0, 599.0, 599.0, 895.0, 550.0, 849.0]

Both of those lists are Pandas series in the original code, but have converted them to lists here for clarity.

Main:

# Add starting weight and bias
w_init = 5e-1 # Increase in price for every 1 square feet
b_init = 200 # Starting price for the cheapest houses

# Iterations and learning rate for the gradient descent algorithm
iterations = 10000
alpha = 1.0e-6 # Delicate and causes a divergence if it's set too large

w_final, b_final, J_hist, p_hist = gradient_descent(
    square_feet, prices, w_init, b_init, alpha, iterations)

print(f'w: {w_final}, b: {b_final}, Costs: {J_hist}, Weight and Bias: {p_hist}')

Functions:

# Cost Function to determine cumulative error between real and predicted values
def cost_function(x, y, w, b):

    # 1) Number of training examples
    m = x.size
    cost = 0

    # 2) Index the training examples and account for cost per instance
    for i in range(m):
        y_hat = w * x[i] + b      
        cost += (y_hat-y[i])**2
        cost /= 2 * m

    # 3) Return total cost
    return cost

# Compute the gradient, i.e., the scalar that improves accuracy
def gradient_function(x, y, w, b):

    m = x.size

    # Partial derivatives of the cost function with respect to weight and bias
    dj_dw = 0
    dj_db = 0

    for i in range(m):
        y_hat = w * x[i] + b
        dj_dw_i = (y_hat - y[i]) * x[i]
        dj_db_i = (y_hat - y[i])
        dj_db += dj_db_i
        dj_dw += dj_dw_i

    dj_dw /= m
    dj_db /= m

    return dj_dw, dj_db

def gradient_descent(x, y, w_init, b_init, learning_rate, 
    num_iters):
    
    # Used for graphing
    J_history = []
    p_history = []
    b = b_init
    w = w_init

    for i in range(num_iters):
        dj_dw, dj_db = gradient_function(x, y, w, b) # Gradient

        # Update weight, bias
        w -= learning_rate * dj_dw
        b -= learning_rate * dj_db
        
        # Prevents resource exhaustion; unnecessary to store similar costs
        # Past 100000 iterations
        if i < 100000:
            J_history.append(cost_function(x, y, w, b))
            p_history.append([w,b])
    
    return w, b, J_history, p_history

I'm stumped over this issue.


Solution

  • In your cost_function, the average cost is inside the loop, which results in a cost value that is m times smaller than it should be, preventing the data set from converging. You should sum up all data before averaging.

    def cost_function(x, y, w, b):
    
        # 1) Number of training examples
        m = x.size
        cost = 0
    
        # 2) Index the training examples and account for cost per instance
        for i in range(m):
            y_hat = w * x[i] + b      
            cost += (y_hat-y[i])**2
        cost /= 2 * m
    
        # 3) Return total cost
        return cost