I am finding the least squares fit for a linear, a quadratic and a cubic function, and am trying to print their errors. Everything works well, but I don't understand why their errors are increasing if I am getting a better fit every time, am I computing the error in a wrong way? Here are the plots, and my code follows:
This is the code that gets me the cubed plot for example.
import numpy as np
import matplotlib.pyplot as plt
A = np.array(((0,1),
(1,1),
(2,1),
(3,1)))
xfeature = A.T[0]
squaredfeature = A.T[0] ** 2
cubedfeature = A.T[0] ** 3
ones = np.ones(4)
b = np.array((1,2,0,3), ndmin=2 ).T
b = b.reshape(4)
order = 3
features = np.concatenate((np.vstack(ones), np.vstack(xfeature), np.vstack(squaredfeature), np.vstack(cubedfeature)), axis = 1)
xstar = np.matmul( np.matmul( np.linalg.inv( np.matmul(features.T, features) ), features.T), b)
plt.scatter(A.T[0],b, c = 'red')
u = np.linspace(0,3,1000)
plt.plot(u, u**3*xstar[3] + u**2*xstar[2] + u*xstar[1] + xstar[0], 'b-')
plt.show()
b = np.array((1,2,0,3), ndmin=2 ).T
y_prediction = u**3*xstar[3] + u**2*xstar[2] + u*xstar[1] + xstar[0]
SSE = np.sum(np.square(y_prediction - b))
MSE = np.mean(np.square(y_prediction - b))
print("Sum of squared errors:", SSE)
print("Mean squared error:", MSE)
I think it's just a tiny mistake in your last block of code: You are computing the errors along the line instead of just for the points. Instead, what you want to do is to compute the distance for each of the points. In other words, y_prediction and b should have the same dimensions
b = np.array((1,2,0,3))
y_prediction = xfeature**3*xstar[3] + xfeature**2*xstar[2] + xfeature*xstar[1] + xstar[0]
SSE = np.sum(np.square(y_prediction - b))
MSE = np.mean(np.square(y_prediction - b))
print("Sum of squared errors:", SSE)
print("Mean squared error:", MSE)
Was that what you were after?