I'm trying to figure out the python code for multivariate gradient descent algorithm, and have found several several implementations like this:
import numpy as np
# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
From the definition of gradient descent, the expression of gradient descent is:
However, in numpy, it is being calculated as: np.dot(xTrans, loss) / m
Can someone please explain how we have got this numpy expression ?
The code is actually very straightforward, it would be beneficial to spend a bit more time to read it.
hypothesis - y
is the first part of the square loss' gradient (as a vector form for each component), and this is set to the loss
variable. The calculuation of the hypothesis looks like it's for linear regression.xTrans
is the transpose of x
, so if we dot product these two we get the sum of their components' products.m
to get the average.Other than that, the code has some python style issues. We typically use under_score
instead of camelCase
in python, so for example the function should be gradient_descent
. More legible than java isn't it? :)