Understanding gradient of gradient descent algorithm in Numpy

I'm trying to figure out the python code for multivariate gradient descent algorithm, and have found several several implementations like this:

import numpy as np

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta

From the definition of gradient descent, the expression of gradient descent is: $[\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}]$

However, in numpy, it is being calculated as: np.dot(xTrans, loss) / m Can someone please explain how we have got this numpy expression ?

Solution

The code is actually very straightforward, it would be beneficial to spend a bit more time to read it.

hypothesis - y is the first part of the square loss' gradient (as a vector form for each component), and this is set to the loss variable. The calculuation of the hypothesis looks like it's for linear regression.
xTrans is the transpose of x, so if we dot product these two we get the sum of their components' products.
we then divide by m to get the average.

Other than that, the code has some python style issues. We typically use under_score instead of camelCase in python, so for example the function should be gradient_descent. More legible than java isn't it? :)