python numpy machine-learning regression gradient-descent

Python gradient descent - cost keeps increasing

I'm trying to implement gradient descent in python and my loss/cost keeps increasing with every iteration.

I've seen a few people post about this, and saw an answer here: gradient descent using python and numpy

I believe my implementation is similar, but cant see what I'm doing wrong to get an exploding cost value:

Iteration: 1 | Cost: 697361.660000
Iteration: 2 | Cost: 42325117406694536.000000
Iteration: 3 | Cost: 2582619233752172973298548736.000000
Iteration: 4 | Cost: 157587870187822131053636619678439702528.000000
Iteration: 5 | Cost: 9615794890267613993157742129590663647488278265856.000000

I'm testing this on a dataset I found online (LA Heart Data): http://www.umass.edu/statdata/statdata/stat-corr.html

Import code:

dataset = np.genfromtxt('heart.csv', delimiter=",")

x = dataset[:]
x = np.insert(x,0,1,axis=1)  # Add 1's for bias
y = dataset[:,6]
y = np.reshape(y, (y.shape[0],1))

Gradient descent:

def gradientDescent(weights, X, Y, iterations = 1000, alpha = 0.01):
    theta = weights
    m = Y.shape[0]
    cost_history = []

    for i in xrange(iterations):
        residuals, cost = calculateCost(theta, X, Y)
        gradient = (float(1)/m) * np.dot(residuals.T, X).T
        theta = theta - (alpha * gradient)

        # Store the cost for this iteration
        cost_history.append(cost)
        print "Iteration: %d | Cost: %f" % (i+1, cost)

Calculate cost:

def calculateCost(weights, X, Y):
    m = Y.shape[0]
    residuals = h(weights, X) - Y
    squared_error = np.dot(residuals.T, residuals)

    return residuals, float(1)/(2*m) * squared_error

Calculate hypothesis:

def h(weights, X):   
    return np.dot(X, weights)

To actually run it:

gradientDescent(np.ones((x.shape[1],1)), x, y, 5)

Solution

Assuming that your derivation of the gradient is correct, you are using: =- and you should be using: -=. Instead of updating theta, you are reassigning it to - (alpha * gradient)

EDIT (after the above issue was fixed in the code):

I ran what the code on what I believe is the right dataset and was able to get the cost to behave by setting alpha=1e-7. If you run it for 1e6 iterations you should see it converging. This approach on this dataset appears very sensitive to learning rate.