Search code examples
pythonscikit-learnbayesianmultinomial

ValueError: not enough values to unpack in GridSearch with Scikit


I'm trying to tune the alpha parameter of a Multinomial Naive Bayes, with the 20newsgroups database. This is my code so far:

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV
import numpy as np

# Divide dataset 
dataset_train = fetch_20newsgroups(subset='train', shuffle=True)
dataset_test = fetch_20newsgroups(subset='test', shuffle=True)



text_clf = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer(sublinear_tf=True)), ('clf',
            MultinomialNB())])

param_grid = {'tfidf__use_idf': (True, False),
              'clf__alpha' : np.linspace(0.001, 1, 100)}

grid_search = GridSearchCV(text_clf, param_grid=param_grid, scoring='precision', cv = None)

# Training
text_clf = grid_search.fit(dataset_train.data,dataset_train.target, average=None)

#prediction
predicted = text_clf.predict(dataset_test.data)


print("NB Accuracy:", 100*np.mean(predicted == dataset_test.target), '%')
print(classification_report(dataset_test.target, predicted, target_names=dataset_train.target_names))
print("Best estimator for alpha in order to get precision ", grid_search.best_estimator_)

The problem is i'm getting the following error:

runfile('C:/Users/omarl/Downloads/new_NB.py', wdir='C:/Users/omarl/Downloads')
Traceback (most recent call last):

  File "<ipython-input-12-d478372ef22a>", line 1, in <module>
    runfile('C:/Users/omarl/Downloads/new_NB.py', wdir='C:/Users/omarl/Downloads')

  File "C:\Users\omarl\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\omarl\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/omarl/Downloads/new_NB.py", line 28, in <module>
    text_clf = grid_search.fit(dataset_train.data,dataset_train.target, average=None)

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 639, in fit
    cv.split(X, y, groups)))

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 332, in __init__
    self.results = batch()

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 458, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 248, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)

  File "C:\Users\omarl\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 197, in _fit
    step, param = pname.split('__', 1)

ValueError: not enough values to unpack (expected 2, got 1)

I have no clue why this is happening, because from the code I reviewed so far this should work. Also I searched in the Scikit website but I didn't found anything. Thanks.


Solution

  • In this line:

    text_clf = grid_search.fit(dataset_train.data,dataset_train.target, average=None)

    average=None is being interpreted as a fit_param, which is not what you intend.

    Average removing this, you will get this error.

    ValueError: Target is multiclass but average='binary'. Please choose another average setting.

    This is because precision is not defined in the multi-class setting. If you change your scoring parameter to 'accuracy', the code works.