I am trying to fit a linear system of polynomials to data. numpy
's polynomial
module has a fitting function included, which works perfectly. When I try to fit the model with an sklearn
linear solver, the fit is terrible! I don't understand what is going wrong. I construct a matrix X where x_{ij} corresponds to the i
th observed input and the j
th polynomial. I know the X matrix is OK because, when I find the coefficients with numpy
, the data fits perfectly. I use sklearn's fit
function (I have tried several linear solvers), but the coefficients it solves for (the coef_
object) are just wrong. What am I doing wrong? How can I make the coefficients found by the sklearn
linear solver match the coefficients found by numpy
?
import numpy as np
from sklearn import linear_model
from sklearn.linear_model import OrthogonalMatchingPursuit
import matplotlib.pyplot as plt
# accept x and polynomial order, return basis of that order
def legs(x, c):
s = np.zeros(c + 1)
s[-1] = 1
return np.polynomial.legendre.legval(x, s)
# Generate normalized samples
samples = np.random.uniform(2, 3, 5)
evals = samples ** 2
xnorm = (samples - 2) * 2 / (3 - 2) - 1
# instantiate linear regressor
omp = linear_model.LinearRegression()
#omp = linear_model.Lasso(alpha=0.000001)
#omp = OrthogonalMatchingPursuit(n_nonzero_coefs=2)
# construct X matrix. Each row is an observed value.
# Each column is a different polynomial.
X = np.array([[legs(xnorm[jj], ii) for ii in range(5)] for jj in range(xnorm.size)])
# Perform the fit. Why isn't this working?
omp.fit(X, evals)
# Plot the truth data
plt.scatter(xnorm, evals, label='data', s=15, marker='x')
# Dot the coefficients found with sklearn against X
plt.scatter(xnorm, omp.coef_.dot(X.T), label='linear regression')
# Dot the coefficients found with numpy against X
plt.scatter(xnorm, np.polynomial.legendre.legfit(xnorm, evals, 4).dot(X.T), label='Numpy regression')
# complete the plot
plt.legend(ncol=3, prop={'size':3})
plt.savefig('simpleExample')
plt.clf()
Your omp.coef_.dot(X.T)
doesn't include the intercept; add that manually or simply use omp.predict
directly.
I.e.:
plt.scatter(xnorm, omp.coef_.dot(X.T) + omp.intercept_, label='linear regression')
plt.scatter(xnorm, evals, label='data', s=15, marker='x')
or
plt.scatter(xnorm, omp.predict(X), label='linear regression')
plt.scatter(xnorm, evals, label='data', s=15, marker='x')