python machine-learning linear-regression gradient-descent

Gradient descent - can I draw function that I will minimize? Linear regression

I'm new in machine learning. I started from linear regression with gradient descent. I have python code for this and I understad this way. My question is: Gradient descent algorithm minimize function, can I plot this function? I want to see what the function in which the minimum is looked like. It possible? My code:

import matplotlib.pyplot as plt import numpy as np

def sigmoid_activation(x):
    return 1.0 / (1 + np.exp(-x))

X = np.array([
    [2.13, 5.49],
    [8.35, 6.74],
    [8.17, 5.79],
    [0.62, 8.54],
    [2.74, 6.92] ])

y = [0, 1, 1, 0, 0]

xdata = [row[0] for row in X] ydata = [row[1] for row in X]

X = np.c_[np.ones((X.shape[0])), X] W = np.random.uniform(size=(X.shape[1], ))

lossHistory = []


for epoch in np.arange(0, 5):

    preds = sigmoid_activation(X.dot(W))
    error = preds - y

    loss = np.sum(error ** 2)
    lossHistory.append(loss)

    gradient = X.T.dot(error) / X.shape[0]
    W += - 0.44 * gradient


plt.scatter(xdata, ydata) plt.show()

plt.plot(np.arange(0, 5), lossHistory) plt.show()

for i in np.random.choice(5, 5):

    activation = sigmoid_activation(X[i].dot(W))
    label = 0 if activation < 0.5 else 1
    print("activation={:.4f}; predicted_label={}, true_label={}".format(
        activation, label, y[i]))


Y = (-W[0] - (W[1] * X)) / W[2]

plt.scatter(X[:, 1], X[:, 2], c=y) plt.plot(X, Y, "r-") plt.show()

Solution

With the risk of being obvious... You can simply plot lossHistory with matplotlib. Or am I missing something?

EDIT: apparently the OP asked what the Gradient Descent (GD) is minimizing. I will try to answer here and I hope I can answer the original question.

The GD algorithm is a generic algorithm to find the minimum of a function in parameter space. In your case (and that is how is usually used with Neural Networks) you want to find the minimum of a loss function: the MSE (Mean Squared Error). You implement the GD algorithm updating the weights as you did with

gradient = X.T.dot(error) / X.shape[0]
W += - 0.44 * gradient

The gradient is just the partial derivative of your loss function (the MSE) with respect to the weights. So are effectively minimizing the loss function (MSE). Then you update your weights with a learning rate of 0.44. Then you simply save the value of your loss function in the array

loss = np.sum(error ** 2)
lossHistory.append(loss)

and therefore the lossHistory array contains your cost (or loss) function that you can plot to check your learning process. The plot should show something decreasing. Does this explanation help you?

Best, Umberto