Search code examples
pythonscikit-learnregression

Correct polynomial regression formula using sklearn


I am performing multiple polynomial regression using sklearn. What I cannot understand is how can I get the full polynomial formula? Is the order in printed coef_ correct? I am trying to put together a correct regression equation but nothing works.

I have a code here where I get the predicted values, coefficients and intercept.

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

Y = df['Y']
X = df[['X1', 'X2']]

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
poly.fit(X_poly, Y)

lin2 = LinearRegression()
model = lin2.fit(X_poly, Y)
y_pred = model_3.predict(X_poly)

print(y_pred)

print('Regression coefficients: ', model.coef_)
print('Intercept: ', model_3.intercept_)

Regression coefficients: [0.0 -3.9407245056806457 63.36152983871869 -0.0073134316780316105 0.28728821270355437 -1.8955885488237727 -317.773937549386]
Intercept:  40.587981548779965

Let's say that X1 = 167.8 and X2 = 22.348595, after the regression the predicted value is 361.67, but none of the version of equation is not giving the result of 361.67.

I find that coef_ prints [1, a, b, c, a^2, b^2, c^2, ab, bc, ca], so in this case [1, a, b, a^2, b^2, ab], but I am not sure that the sequence here is correct. I am not getting 361.67, but 370.56 with this:

y =  0.0 + -3.94 * X1 +  63.36  * X2 + -0.007  * X1^2 +  0.2872  * X1 * X2 +  -1.895  *  X2^2 + -317.77

Solution

  • I do not believe there is anything wrong with formula or the order, it is just that rounding the decimals will make a difference from your prediction by a more significant amount than you expected.

    If you put in all decimal places in the regression coefficients in the order you originally have, you will get the correct predicted value of 361.67 I believe.

    Please let me know if there is anything wrong or if I misinterpreted the issue.

    For example:

    
    X1 = 167.8
    X2 = 22.348595
    
    y =  0.0 + -3.9407245056806457 * X1 +  63.36152983871869  * X2 + -0.0073134316780316105 * X1**2 +  0.28728821270355437  * X1 * X2 +  -1.8955885488237727 *  X2**2 -317.773937549386
    
    
    print(y)
    

    Output:

    361.67832067451957