Search code examples
pythonnumpyregressionleast-squares

Can't figure out how to print the least squares error


I wrote some code to find the best fitting line for a couple of data points using the analytical solution to least squares. Now I would like to print the error between the actual data and my estimated line, but I have no idea how to compute it. Here is my code:

import numpy as np
import matplotlib.pyplot as plt

A = np.array(((0,1),
             (1,1),
             (2,1),
             (3,1)))

b = np.array((1,2,0,3), ndmin = 2 ).T

xstar = np.matmul( np.matmul( np.linalg.inv( np.matmul(A.T, A) ), A.T), b)

print(xstar)

plt.scatter(A.T[0], b)
u = np.linspace(0,3,20)
plt.plot(u, u * xstar[0] + xstar[1], 'b-')

Solution

  • You have already plotted the predictions from the linear regression. So from the value of the prediction, you can calculate the "sum of square errors (SSE)" or the "mean square error (MSE)" as follows:

    y_prediction = u * xstar[0] + xstar[1]
    SSE = np.sum(np.square(y_prediction - b))
    MSE = np.mean(np.square(y_prediction - b))
    print(SSE)
    print(MSE)
    

    An aside note. You might want to use np.linalg.pinv as that is a more numerically stable matrix inverse operator.