Minimum viable example included ;)
Want I want to to is simply to use the parameters from GridSearchCV to use a Pipeline.
#I want to create a SVM using a Pipeline, and validate the model (measure the accuracy)
#import libraries
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
#load test data
data = load_iris()
X_trainset, X_testset, y_trainset, y_testset = train_test_split(data['data'], data['target'], test_size=0.2)
#And here we prepare the pipeline
pipeline = Pipeline([('scaler', StandardScaler()), ('SVM', SVC())])
grid = GridSearchCV(pipeline, param_grid={'SVM__gamma':[0.1,0.01]}, cv=5), y_trainset)
# (Done! Now I can print the accuracy and other metrics)
#Now I want to put together training set and validation set, to train the model before deployment
#Of course, I want to use the best parameters found by GridSearchCV
big_x = np.concatenate([X_trainset,X_testset])
big_y = np.concatenate([y_trainset,y_testset])
Up to here, it works with no problem. Then, I write this line:
model2 =,big_y, grid.best_params_)
TypeError: fit() takes from 2 to 3 positional arguments but 4 were given
Then I tried to be more explicit:
model2 =,big_y,fit_params=grid.best_params_)
Error again!
ValueError: does not accept the fit_params parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. `, y, logisticregression__sample_weight=sample_weight)`.
Then I tried (out of curiosity) to insert manually the parameter:,big_y, SVM__gamma= 0.01) #Note: I may need to insert many parameters, not just one
Error again :(
TypeError: fit() got an unexpected keyword argument 'gamma'
I cannot understand why it does not find gamma. I decided to print pipeline.get_params() to have an idea.
In [11]: print(pipeline.get_params())
Out [11]:
{'memory': None,
'steps': [('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('SVM', SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False))],
'verbose': False,
'scaler': StandardScaler(copy=True, with_mean=True, with_std=True),
'SVM': SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False),
'scaler__copy': True, 'scaler__with_mean': True, 'scaler__with_std': True, 'SVM__C': 1.0, 'SVM__break_ties': False, 'SVM__cache_size': 200, 'SVM__class_weight': None, 'SVM__coef0': 0.0, 'SVM__decision_function_shape': 'ovr', 'SVM__degree': 3, 'SVM__gamma': 'scale', 'SVM__kernel': 'rbf', 'SVM__max_iter': -1, 'SVM__probability': False, 'SVM__random_state': None, 'SVM__shrinking': True, 'SVM__tol': 0.001, 'SVM__verbose': False}
I can find SVM__gamma in the list! So why is there an error?
Version of Scikit: 0.22.1
Version of python: 3.7.6
, as in, the call to the .fit()
function of the SVC Class, has no parameter called gamma. When you call
it's passing the gamma param to the .fit()
call of the SVM step, which isn't going to work.
You set params in scikit-learn using the .set_params() functions. At the lowest level (I.E. against SVC itself) you can just do SVC.set_params(gamma='blah')
. In the pipeline you'd follow the same double underscore notation you're using in the param grid, so pipeline.set_params(SVM__gamma=blah)
If you're only setting a single param against a single step of your pipeline, it's usually convenient to access the step directly with pipeline.named_steps.SVM.set_params(gamma='blah')
, or else use pipeline.set_params(**grid.best_params_)
to use your grid search's best params. (the ** notation explodes a dict of {'A':1, 'B':2} out into A=1, B=2 for you)
Here's a snippet of a script that does what I think you're trying to do (albeit with different algorithms):
# Set the classifier as an XGBClassifier
clf_pipeline = Pipeline(
('preprocessor', preprocessor),
('classifier', XGBClassifier(n_jobs=6, n_estimators=20))
# In[41]:
# Cross validation: 60 iterations with 3 fold CV.
n_features_after_transform = clf_pipeline.named_steps.preprocessor.fit_transform(df).shape[1]
param_grid = {
'classifier__max_depth':stats.randint(low=2, high=100),
'classifier__max_features':stats.randint(low=2, high=n_features_after_transform),
'classifier__gamma':stats.uniform.rvs(0, 0.25, size=10000),
'classifier__subsample':stats.uniform.rvs(0.5, 0.5, size=10000),
'classifier__reg_alpha':stats.uniform.rvs(0.5, 1., size=10000),
'classifier__reg_lambda':stats.uniform.rvs(0.5, 1., size=10000)
rscv = RandomizedSearchCV(
cv=StratifiedKFold(n_splits=3, shuffle=True)
), y)
# In[42]:
# Set the tuned best params and beef up the number of estimators.
So long story short, you can set an individual parameter by accessing the class you want to set the param for in the pipeline's named_steps
. To set the parameters that your Grid Search identified as best, use pipeline.set_params(**grid.best_params_)