Search code examples
pythonscikit-learnpolynomials

Not getting sklearn polynomial regression right


I understood the principle, but when I try it in action I got something weird, Can someone explain me what am i doing wrong ?

So I use this dataset

With a simple code like :

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Importing the dataset
df = pd.read_csv('data.csv')

X = df.values[:, 0:1]
y = df.values[:, 1:2]

poly_reg = PolynomialFeatures(degree=6)
X_poly = poly_reg.fit_transform(X)
lin_reg_poly = LinearRegression().fit(X_poly, y)

plt.scatter(X, y, color='blue')
plt.plot(X, lin_reg_poly.predict(poly_reg.fit_transform(X)), color='r')
plt.show()

Unfortunately it produces : graph

I don't know what I am doing wrong.


Solution

  • You might try spline interpolation. Here is my approach to this dataset:

    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    
    from sklearn.linear_model import Ridge
    from sklearn.preprocessing import PolynomialFeatures, SplineTransformer
    from sklearn.pipeline import make_pipeline
    
    df = pd.read_csv('data.csv')
    
    x = df.values[:, 0:1]
    y = df.values[:, 1:2]
    
    plt.figure()
    plt.scatter(x, y, s=2)
    
    transformer = SplineTransformer(degree=3, n_knots=10)
    model = make_pipeline(transformer, Ridge(alpha=1e-3))
    model.fit(x, y)
    
    y_plot = model.predict(x)
    plt.plot(x, y_plot, label="B-spline", color='red')
    plt.show()
    

    result

    You can adjust n_knots to make it fitting the points better.