Search code examples
numpyscikit-learntext-classificationnaivebayestfidfvectorizer

'numpy.ndarray' object has no attribute 'lower'


I am fairly new to ML, I am trying to fit some data on my NB-classifier.

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC

# Naïve Bayes:
text_clf_nb = Pipeline([('tfidf', TfidfVectorizer()),
                     ('clf', MultinomialNB()),
])

# Linear SVC:
text_clf_lsvc = Pipeline([('tfidf', TfidfVectorizer()),
                     ('clf', LinearSVC()),
])

Code for fitting the data:

text_clf_nb.fit(X_train, y_train)

The shape of my training & test data is

X_train.shape, X_test.shape, y_train.shape, y_test.shape : ((169, 1), (84,), (169, 1), (84,))

But keep getting :'numpy.ndarray' object has no attribute 'lower'

Here is full trace to the error:

AttributeError                            Traceback (most recent call last)
<ipython-input-57-139757126594> in <module>
----> 1 text_clf_nb.fit(X_train, y_train)

~\miniconda3\envs\nlp_course\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params)
    263             This estimator
    264         """
--> 265         Xt, fit_params = self._fit(X, y, **fit_params)
    266         if self._final_estimator is not None:
    267             self._final_estimator.fit(Xt, y, **fit_params)

~\miniconda3\envs\nlp_course\lib\site-packages\sklearn\pipeline.py in _fit(self, X, y, **fit_params)
    228                 Xt, fitted_transformer = fit_transform_one_cached(
    229                     cloned_transformer, Xt, y, None,
--> 230                     **fit_params_steps[name])
    231                 # Replace the transformer of the step with the fitted
    232                 # transformer. This is necessary when loading the transformer

~\miniconda3\envs\nlp_course\lib\site-packages\sklearn\externals\joblib\memory.py in __call__(self, *args, **kwargs)
    340 
    341     def __call__(self, *args, **kwargs):
--> 342         return self.func(*args, **kwargs)
    343 
    344     def call_and_shelve(self, *args, **kwargs):

Solution

  • You have checked the shape of arrays, but have you tried something like:

    data = vectorizer.fit_transform(array.ravel())
    

    This should do the trick for you