Search code examples
pythonscikit-learnpipelinegridsearchcv

Invalid Parameters for Sklearn GridSearchCV


I get ValueError: Invalid parameter... for every line in my grid.

I have tried removing line by line every grid option until the grid is empty. I copied and pasted the names of the parameters from pipeline.get_params() to ensure that they do not have typos.

from sklearn.model_selection import train_test_split
x_in, x_out, y_in, y_out = train_test_split(X, Y, test_size=0.2, stratify=Y)

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest, chi2, f_classif
from sklearn.svm import LinearSVC
from sklearn.model_selection import GridSearchCV

grid = {
    'TF-IDF__ngram_range':[(1,2),(2,3)],
    'TF-IDF__stop_words': [None, 'english'],
    'SelectKBest__k': [10000, 15000],
    'SelectKBest__score_func': [f_classif, chi2],
    'linearSVC__penalty': ['l1', 'l2']
}

pipeline = Pipeline([('tfidf', TfidfVectorizer(sublinear_tf=True)),
                     ('selectkbest', SelectKBest()),
                     ('linearscv', LinearSVC(max_iter=10000, dual=False))])

grid_search = GridSearchCV(pipeline, param_grid=grid, scoring='accuracy', n_jobs=-1, cv=5)
grid_search.fit(X=x_in, y=y_in)


Solution

  • I think you are no referring to the stages of the pipeline with the correct name on the grid. The names that you assign on the pipeline (tfidf,selectkbest,linearscv) for each stage should be the same ones in the grid. I would do:

    pipeline = Pipeline([('tfidf', TfidfVectorizer(sublinear_tf=True)),
                         ('selectkbest', SelectKBest()),
                         ('linearscv', LinearSVC(max_iter=10000, dual=False))]) 
    grid = {
        'tfidf__ngram_range':[(1,2),(2,3)],
        'tfidf__stop_words': [None, 'english'],
        'selectkbest__k': [10000, 15000],
        'selectkbest__score_func': [f_classif, chi2],
        'linearscv__penalty': ['l1', 'l2'] }