I have a data having 3 features and 1 target variable. I am trying to use gradient descent and later minimize the RMSE
While trying to run the code, I am getting nan as the cost/error term Tried a lot of methods but can't figure it out.
Can anyone please tell me where I am going wrong with the calculation.
Here's the code:
m = len(y)
# calculate gradient
def grad(theta):
dJ = 1/m*np.sum((Xnorm.dot(theta)-ynorm.reshape(len(ynorm),1))*Xnorm,axis=0).reshape(-1,1)
return dJ
def cost(theta):
J = np.sum((Xnorm.dot(theta)-ynorm.reshape(len(ynorm),1))**2,axis=0)
return J
def GD(theta0,learning_rate = 0.0005,epochs=500,TOL=1e-1):
theta_history = [theta0]
J_history = [cost(theta0)]
print(J_history)
thetanew = theta0*10000
# print(f'epoch \t Cost(J) \t')
for epoch in range(epochs):
if epoch%100 == 0:
print('epoch', epoch, 'cost',J_history[-1])
dJ = grad(theta0)
J = cost(theta0)
thetanew = theta0 - learning_rate*dJ
theta_history.append(thetanew)
J_history.append(J)
if np.sum((thetanew - theta0)**2) < TOL:
print('Convergence achieved.')
break
theta0 = thetanew
return thetanew,theta_history,J_history
Even for the first theta value, it returns nan
theta,theta_history,J_history = GD(theta0)
Shape of my variables
The only reasonable solution which we came up was since the cost was high.. it was not possible to use this approach for this solution. We tried using a different approach like simple linear regression and it worked.