python machine-learning linear-regression gradient-descent

Python gradient descent not converge

So I'm a newbie to machine-learning and i have been trying to implement gradient descent. My code seems to be right (I think) but it didn't converge to the global optimum.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


def AddOnes(matrix):
    one = np.ones((matrix.shape[0], 1))
    X_bar = np.concatenate((one, matrix), axis=1)
    return X_bar


# Load data
df = pd.read_excel("Book1.xlsx", header=3)
X = np.array([df['Height']]).T
y = np.array([df['Weight']]).T

m = X.shape[0]
n = X.shape[1]
iterations = 30

# Build X_bar
X = AddOnes(X)

# Gradient descent
alpha = 0.00003
w = np.ones((n+1,1))
for i in range(iterations):
    h = np.dot(X, w)
    w -= alpha/m * np.dot(X.T, h-y)

print(w)

x0 = np.array([np.linspace(145, 185, 2)]).T
x0 = AddOnes(x0)
y0 = np.dot(x0, w)
x0 = np.linspace(145, 185, 2)

# Visualizing
plt.plot(X, y, 'ro')
plt.plot(x0, y0)
plt.axis([140, 190, 40, 80])
plt.xlabel("Height(cm)")
plt.ylabel("Weight(kg)")
plt.show()

Visualizing data

Solution

You are using Linear Regression with a single neuron, a single neuron can only learn a straight line irrespective of the dataset you provide, where W acts as slope , your network has learnt optimal W for your X such that WX gives minimal error.

The scatter plot (red dot) of the output shows your dataset values, you can observe that , the dataset is not linear, so even if you train 1M times, the algorithm will never converge. But the learnt function is optimal for sure, as it is a straight line that has minimal error.

So , I recommed you to use multiple layers with non-liner activations like ReLu and Sigmoid. Use linear activation at output as you are predicting a real number.