Search code examples
pythonnumpygradient-descent

Gradient Descent Variation doesn't work


I try to implement the Stochastic Gradient Descent Algorithm. The first solution works:

def gradientDescent(x,y,theta,alpha):
    xTrans = x.transpose()
    for i in range(0,99):
        hypothesis = np.dot(x,theta)
        loss = hypothesis - y
        gradient = np.dot(xTrans,loss)
        theta = theta - alpha * gradient

    return theta

This solution gives the right theta values but the following algorithm doesnt work:

def gradientDescent2(x,y,theta,alpha):
    xTrans = x.transpose();
    for i in range(0,99):
        hypothesis = np.dot(x[i],theta)
        loss = hypothesis - y[i]
        gradientThetaZero= loss * x[i][0]
        gradientThetaOne = loss * x[i][1]
    theta[0] = theta[0] - alpha * gradientThetaZero
    theta[1] = theta[1] - alpha * gradientThetaOne
    return theta

I don't understand why solution 2 does not work, basically it does the same like the first algorithm.

I use the following code to produce data:

def genData():
    x = np.random.rand(100,2)
    y = np.zeros(shape=100)
    for i in range(0, 100):
        x[i][0] = 1
        # our target variable
        e = np.random.uniform(-0.1,0.1,size=1)
        y[i] = np.sin(2*np.pi*x[i][1]) + e[0]
    return x,y

And use it the following way:

x,y = genData()
theta = np.ones(2)
theta = gradientDescent2(x,y,theta,0.005)
print(theta)

I hope you can help me! Best regards, Felix


Solution

  • Your second code example overwrites the gradient computation on each iteration over your observation data.

    In the first code snippet, you properly adjust your parameters in each looping iteration based on the error (loss function).

    In the second code snippet, you calculate the point-wise gradient computation in each iteration, but then don't do anything with it. That means that your final update effectively only trains on the very last data point.

    If instead you accumulate the gradients within the loop by summing ( += ), it should be closer to what you're looking for (as an expression of the gradient of the loss function with respect to your parameters over the entire observation set).