Search code examples
pythonmachine-learningscikit-learnpolynomialsgrid-search

GridsearchCV for Polynomial Regression


I was new to Machine Learning and stuck with this.

When I was trying to implement polynomial regression in Linear model, like using several degree of polynomials range(1,10) and get different MSE. I actually use GridsearchCV method to find the best parameters for polynomial.

from sklearn.model_selection import GridSearchCV

poly_grid = GridSearchCV(PolynomialRegression(), param_grid, cv=10, scoring='neg_mean_squared_error')

I don't know how to get the the above PolynomialRegression() estimator. One solution I searched was:

import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline

def PolynomialRegression(degree=2, **kwargs):
    return make_pipeline(PolynomialFeatures(degree), LinearRegression(**kwargs))

param_grid = {'polynomialfeatures__degree': np.arange(10), 'linearregression__fit_intercept': [True, False], 'linearregression__normalize': [True, False]}

poly_grid = GridSearchCV(PolynomialRegression(), param_grid, cv=10, scoring='neg_mean_squared_error')

But it didn't even generate any result.


Solution

  • poly_grid = GridSearchCV...

    will only declare and instantiate the grid search object. You need to supply some data with fit() method to do any training or hyper-parameter search.

    Something like this:

    poly_grid.fit(X, y)
    

    Where X and y are your training data and labels.

    Please see the documentation:

    fit(X, y=None, groups=None, **fit_params)[source]

    Run fit with all sets of parameters.
    

    And then use the cv_results_ and/or best_params_ to analyse the results.

    Please take a look at the examples given below:

    Responding to comment:

    @BillyChow Do you call poly_grid.fit() or not? If no, then obviously it wont produce any result.

    If yes, then depending on your data, it will take a lot of time because you have specified degree from 1 to 10 in params with 10-fold cv. So as the degree increases, the time to fit and cross-validate increases pretty quickly.

    Still if you want to see the working, you can add verbose param to the gridSearchCV, like this:

    poly_grid = GridSearchCV(PolynomialRegression(), param_grid, 
                             cv=10, 
                             scoring='neg_mean_squared_error', 
                             verbose=3) 
    

    And then call poly_grid.fit(X, y)