Search code examples
machine-learningscikit-learngridsearchcv

Getting error while trying to fit to GridSearchCV


I am trying to fit a ridge regression model to my data using a pipeline and GridSearchCV.

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

X = transformed_data.iloc[:, :-1]
y = transformed_data['class'] 

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)

params = {}
params['ridge__alpha'] = np.arange(0, 100, 1).tolist()

t = [('labelenc',LabelEncoder() , [0]), ('stand', StandardScaler(), [1,2,3,4,5,6]), ('poly'),PolynomialFeatures(degree=2),[1,2,3,4,5,6] ]
transformer = ColumnTransformer(transformers=t)


pipe = Pipeline(steps=[('t', transformer), ('m',Ridge())])


#grid_ridge2_r2 = GridSearchCV(pipe, params, cv=10, scoring='r2', n_jobs=-1)
#results_ridge2_r2 = grid_ridge2_r2.fit(X_train,y_train)

grid_ridge2_rmse = GridSearchCV(pipe, params, cv=10, scoring='neg_root_mean_squared_error', n_jobs=-1)
results_ridge2_rmse = grid_ridge2_rmse.fit(X_train,y_train)

I keep getting

ValueError: too many values to unpack (expected 3)

in the last line grid_ridge2_rmse.fit(X_train,y_train). My intuition is that there is something wrong with how I am splitting the dataset.


Solution

  • There is a few error within your pipeline.

    First LabelEncoder cannot be used inside a scikit-learn pipeline as it is used to modify y not X. Assuming that you want to encode a categorical value of your feature it should be replaced by OrdinalEncoder.

    Then, to set the grid parameter it has to be named with the following name convention <step>__<hyperparameter. Setting the ridge parameter in your case should be m__alpha.

    The pipeline parameters can be seen using pipe.get_params().

    I would do as follows:

    from sklearn.compose import ColumnTransformer
    from sklearn.preprocessing import PolynomialFeatures, OrdinalEncoder, StandardScaler
    from sklearn.linear_model import Ridge
    from sklearn.model_selection import GridSearchCV
    from sklearn.pipeline import Pipeline
    import numpy as np
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)
    
    params = {'m__alpha' : np.arange(0, 100, 1).tolist()}
    
    t = [
        ('labelenc',OrdinalEncoder() , [0]),
        ('stand', StandardScaler(), [1,2,3,4,5,6]),
        ('poly', PolynomialFeatures(degree=2), [1,2,3,4,5,6])
    ]
    
    transformer = ColumnTransformer(transformers=t)
    
    
    pipe = Pipeline(steps=[('t', transformer), ('m',Ridge())])
    
    grid_ridge2_rmse = GridSearchCV(pipe, params, cv=10, scoring='neg_root_mean_squared_error', n_jobs=-1)
    results_ridge2_rmse = grid_ridge2_rmse.fit(X_train,y_train)