Search code examples
pythonmachine-learninglinear-regressiongradient-descent

Why is my gradient descent algorithm not working correctly?


I am trying to mimic the gradient descent algorithm for linear regression from Andrew NG's Machine learning course to Python, but for some reason my implementation is not working correctly.

Here's my implementation in Octave, it works correctly:

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

J_history = zeros(num_iters, 1);

for iter = 1:num_iters


    prediction = X*theta;
    margin_error = prediction - y;

    gradient = 1/m * (alpha * (X' * margin_error));
    theta = theta - gradient;

    J_history(iter) = computeCost(X, y, theta);

end

end

However, when I translate this to Python for some reason it is not giving me accurate results. The cost seems to be going up rather than descending.

Here's my implementation in Python:

def gradientDescent(x, y, theta, alpha, iters):
    m = len(y)

    J_history = np.matrix(np.zeros((iters,1)))

    for i in range(iters):
        prediction = x*theta.T
        margin_error = prediction - y

        gradient = 1/m * (alpha * (x.T * margin_error))
        theta = theta - gradient

        J_history[i] = computeCost(x,y,theta)

    return theta,J_history

My code is compiling and there isn't anything wrong. Please note this is theta:

theta = np.matrix(np.array([0,0]))

Alpha and iters is set to this:

alpha = 0.01
iters = 1000

When I run it, opt_theta, cost = gradientDescent(x, y, theta, alpha, iters) and print out opt_theta, I get this:

matrix([[  2.36890383e+16,  -1.40798902e+16],
        [  2.47503758e+17,  -2.36890383e+16]])

when I should get this:

matrix([[-3.24140214, 1.1272942 ]])

What am I doing wrong?

Edit:

Cost function

def computeCost(x, y, theta):
#   Get length of data set
    m = len(y)

    # We get theta transpose because we are working with a numpy array [0,0] for example
    prediction = x * theta.T

    J = 1/(2*m) * np.sum(np.power((prediction - y), 2))

    return J

Solution

  • Look there:

    >>> A = np.matrix([3,3,3])
    >>> B = np.matrix([[1,1,1], [2,2,2]])
    >>> A-B
    matrix([[2, 2, 2],
            [1, 1, 1]])
    

    Matrices are broadcasted together.

    "it's because np.matrix inherits from np.array. np.matrix overrides multiplication, but not addition and subtraction"

    In yours situation theta(1x2) subtract gradient(2x1) and in result you have got 2x2. Try to transpose gradient before subtracting.

    theta = theta - gradient.T