Search code examples
pythonnumpygradient-descent

Understanding gradient of gradient descent algorithm in Numpy


I'm trying to figure out the python code for multivariate gradient descent algorithm, and have found several several implementations like this:

import numpy as np

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta

From the definition of gradient descent, the expression of gradient descent is: [\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}]

However, in numpy, it is being calculated as: np.dot(xTrans, loss) / m Can someone please explain how we have got this numpy expression ?


Solution

  • The code is actually very straightforward, it would be beneficial to spend a bit more time to read it.

    • hypothesis - y is the first part of the square loss' gradient (as a vector form for each component), and this is set to the loss variable. The calculuation of the hypothesis looks like it's for linear regression.
    • xTrans is the transpose of x, so if we dot product these two we get the sum of their components' products.
    • we then divide by m to get the average.

    Other than that, the code has some python style issues. We typically use under_score instead of camelCase in python, so for example the function should be gradient_descent. More legible than java isn't it? :)