Search code examples
pythongradient-descent

Gradient descent function in python - error in loss function or weights


Im working with the gradient function for an exercise but i still couldn't get the expected outcome. That is, i receive 2 error messages:

  1. Wrong output for the loss function. Check how you are implementing the matrix multiplications.

  2. Wrong values for weight's matrix theta. Check how you are updating the matrix of weights.

When applying the function (see below) i notice that the cost decreases at each iteration but still it does not converge to the desired outcome in the exercise. I already tried several adaptations on the formula but couldn't solve it yet.

# gradientDescent

def gradientDescent(x, y, theta, alpha, num_iters):

Input:
    x: matrix of features which is (m,n+1)
    y: corresponding labels of the input matrix x, dimensions (m,1)
    theta: weight vector of dimension (n+1,1)
    alpha: learning rate
    num_iters: number of iterations you want to train your model for
Output:
    J: the final cost
    theta: your final weight vector
Hint: you might want to print the cost to make sure that it is going down.

### START CODE HERE ###
# get 'm', the number of rows in matrix x
m = len(x)

for i in range(0, num_iters):
    
    # get z, the dot product of x and theta
    # z = predictins
    z = np.dot(x, theta)
    h = sigmoid(z)
    loss = z - y
    
    # calculate the cost function
    J =  (-1/m) * np.sum(loss)
    print("Iteration %d | Cost: %f" % (i, J))#
            
    gradient = np.dot(x.T, loss)
    
    #update theta
    theta = theta - (1/m) * alpha * gradient
    
    
### END CODE HERE ###
J = float(J)
return J, theta

Solution

  • The issue is that i wrongly applied the formula of the cost function and the formula for calculating the weights:

    ๐ฝ=โˆ’1/๐‘šร—(๐ฒ๐‘‡โ‹…๐‘™๐‘œ๐‘”(๐ก)+(1โˆ’๐ฒ)๐‘‡โ‹…๐‘™๐‘œ๐‘”(1โˆ’๐ก))

    ๐œƒ=๐œƒโˆ’๐›ผ/๐‘šร—(๐ฑ๐‘‡โ‹…(๐กโˆ’๐ฒ))

    The solution is:

     J =  (-1/m) * (np.dot(y.T, np.log(h)) +  (np.dot((1-y).T, np.log(1-h)))
     theta = theta - (alpha/m) * gradient