Gradient Descent returning nan in output

I have a data having 3 features and 1 target variable. I am trying to use gradient descent and later minimize the RMSE

While trying to run the code, I am getting nan as the cost/error term Tried a lot of methods but can't figure it out.

Can anyone please tell me where I am going wrong with the calculation. Here's the code: m = len(y)

# calculate gradient
def grad(theta):
    
    dJ = 1/m*np.sum((Xnorm.dot(theta)-ynorm.reshape(len(ynorm),1))*Xnorm,axis=0).reshape(-1,1)
    return dJ

def cost(theta):
    J = np.sum((Xnorm.dot(theta)-ynorm.reshape(len(ynorm),1))**2,axis=0)
    return J

def GD(theta0,learning_rate = 0.0005,epochs=500,TOL=1e-1):
    
    theta_history = [theta0]
    J_history = [cost(theta0)]
    print(J_history)
    
    thetanew = theta0*10000
#     print(f'epoch \t Cost(J) \t')
    for epoch in range(epochs):
        if epoch%100 == 0:
            print('epoch', epoch, 'cost',J_history[-1])
        dJ = grad(theta0)
        J = cost(theta0)
        
        thetanew = theta0 - learning_rate*dJ
        theta_history.append(thetanew)
        J_history.append(J)
        
        if np.sum((thetanew - theta0)**2) < TOL:
            print('Convergence achieved.')
            break
        theta0 = thetanew

    return thetanew,theta_history,J_history

Even for the first theta value, it returns nan

theta,theta_history,J_history = GD(theta0)

Shape of my variables

Solution

The only reasonable solution which we came up was since the cost was high.. it was not possible to use this approach for this solution. We tried using a different approach like simple linear regression and it worked.