I am trying to mimic the gradient descent algorithm for linear regression from Andrew NG's Machine learning course to Python, but for some reason my implementation is not working correctly.
Here's my implementation in Octave, it works correctly:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
prediction = X*theta;
margin_error = prediction - y;
gradient = 1/m * (alpha * (X' * margin_error));
theta = theta - gradient;
J_history(iter) = computeCost(X, y, theta);
end
end
However, when I translate this to Python for some reason it is not giving me accurate results. The cost seems to be going up rather than descending.
Here's my implementation in Python:
def gradientDescent(x, y, theta, alpha, iters):
m = len(y)
J_history = np.matrix(np.zeros((iters,1)))
for i in range(iters):
prediction = x*theta.T
margin_error = prediction - y
gradient = 1/m * (alpha * (x.T * margin_error))
theta = theta - gradient
J_history[i] = computeCost(x,y,theta)
return theta,J_history
My code is compiling and there isn't anything wrong. Please note this is theta:
theta = np.matrix(np.array([0,0]))
Alpha and iters is set to this:
alpha = 0.01
iters = 1000
When I run it, opt_theta, cost = gradientDescent(x, y, theta, alpha, iters)
and print out opt_theta, I get this:
matrix([[ 2.36890383e+16, -1.40798902e+16],
[ 2.47503758e+17, -2.36890383e+16]])
when I should get this:
matrix([[-3.24140214, 1.1272942 ]])
What am I doing wrong?
Edit:
Cost function
def computeCost(x, y, theta):
# Get length of data set
m = len(y)
# We get theta transpose because we are working with a numpy array [0,0] for example
prediction = x * theta.T
J = 1/(2*m) * np.sum(np.power((prediction - y), 2))
return J
Look there:
>>> A = np.matrix([3,3,3])
>>> B = np.matrix([[1,1,1], [2,2,2]])
>>> A-B
matrix([[2, 2, 2],
[1, 1, 1]])
Matrices are broadcasted together.
In yours situation theta(1x2) subtract gradient(2x1) and in result you have got 2x2. Try to transpose gradient before subtracting.
theta = theta - gradient.T