python machine-learning gradient-descent

Why is my gradient descent function giving me large negative values?

I am trying to code gradient descent in python. The first code below plots error function for 2D (wx+b) and 1D(wx) cases. The 2nd code is my gradient descent function which is saved as a separate function i.e. not in the main file. Based on the plots the values for min_w and min_b should be positve, but the gradient descent function is giving vary large negative value. In fact, when starting from w=0 and b=0, it goes negative right away (which corresponds to ascent in the error curve), the descent is in the positive direction. I've tried different step sizes, number of iterations and initial values of w and b but still no success. Please tell me where am I going wrong.

import numpy as np
import matplotlib.pyplot as plt

x_train = np.array([1.0, 1.7, 2.0, 2.5, 3.0, 3.2])
x_mean = np.mean(x_train)
x_std = np.std(x_train)
x_train_normalized = (x_train - x_mean) / x_std
y_train = np.array([250, 300, 480,  430,   630, 730,])
w=np.random.rand(200) * 2 - 1
b=np.random.rand(200) * 2 - 1
W, B = np.meshgrid(w, b)
J=np.zeros((len(w),len(b)))
J1=np.zeros((len(w)))
y_bar = np.zeros((len(w), len(x_train)))
m = len(x_train)
    
x_new=x_train_normalized[:,np.newaxis,np.newaxis]
y_new=y_train[:,np.newaxis,np.newaxis]
y_bar=W*x_new+B
sum_squared_errors=np.sum((y_bar-y_new)**2,axis=0)
J=(1/(2*m))*sum_squared_errors
from GD import gd2D
min_w,min_b=gd2D(y_new,x_new,m,np)
     
y_bar1=W*x_new
sum_squared_errors1=np.sum((y_bar1-y_new)**2,axis=0)
J1=(1/(2*m))*sum_squared_errors1
       
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(W, B, J, cmap='viridis', edgecolor='none')
ax.set_xlabel('W-axis')
ax.set_ylabel('B-axis')
ax.set_zlabel('J-axis')
ax.view_init(elev=20, azim=-60)  # Adjust the viewing angle for better visualization
fig.colorbar(surf)  # Add a color bar to the side
plt.show()
       
plt.scatter(W,J1,marker='X',c='r',s=10)
plt.title("ABS")
plt.ylabel("J")
plt.xlabel("w")
plt.show()

The following is my gradient descent function.

def gd2D(y_new,x_new,m,np):
    itr = 5000
    alpha = 0.01
    w=0
    b=0
    for i in range(itr):
        y_bar=w*x_new+b
        dw=(np.sum((y_new-y_bar)*x_new))/m
        db=(np.sum(y_new-y_bar))/m
        w=w-alpha*dw
        b=b-alpha*db
    return w,b

Solution

The first issue I notice is a shape mismatch in the x_train and the w. I don't suggest using [:, np.newaxis, np.newaxis] to create more dimensions as it is computationally expensive on the memory.

I'd like to go over the definition of a weight. The weight matrix W is a matrix of weights with shape (x, y), where x is the number of "nodes" in the next layer, and y is the number of nodes in the previous layer. If we have a network with 2 layers (input and output, no hidden)，with 10 input nodes, and 2 output nodes, our weight matrix is a matrix of shape (2,10).

We define it like this because, with input of shape (10,1) and output of shape (2,1), we can multiply weight @ input to get an output of shape (2,1).

This works because when multiplying matrices:

(2,10) @ (10,1) --> (2,1)

and the "inner" dimensions are reduced, forming a matrix of the shapes of the "outer dimensions".

Since you have massive shape differences in x and w, and you are using an element-wise operation, it is hard to determine your desired shape of output. Your x has a size of 6, while your w has a size of 200.

When you are calculating your y_bar, the x_new * w operation is returning a matrix of size (6,1,200), which is clearly not an output. This may be a reason to why your gradient descent is not working properly.