python model nlp countvectorizer streamlit

Streamlit ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I am trying to fit my model on Streamlit.io app, but I am getting the above Value-Error. But it doesn't give the same error on Jupyter Notebook Please any better approach will help a lot.

 
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
File "c:\users\8470p\anaconda3\lib\site-packages\streamlit\ScriptRunner.py", line 311, in _run_script exec(code, module.__dict__)
File "C:\Users\8470p\app2.py", line 122, in  bow_transformer = CountVectorizer(analyzer=text_process).fit(messages['message'])
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1024, in fit self.fit_transform(raw_documents)
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1058, in fit_transform self.fixed_vocabulary_)
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 962, in _count_vocab analyze = self.build_analyzer()
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 339, in build_analyzer if self.analyzer == 'char':
File "c:\users\8470p\anaconda3\lib\site-packages\pandas\core\generic.py", line 1555, in __nonzero__ self.__class__.__name__

enter code here



    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.model_selection import train_test_split
    from sklearn.pipeline import Pipeline
    from sklearn.metrics import classification_report
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.naive_bayes import MultinomialNB

    bow_transformer = 
    CountVectorizer(analyzer=text_process).fit(messages['message'])

    msg_train, msg_test, label_train, label_test = 
    train_test_split(messages['message'], messages['label'], test_size=0.2)

    pipeline = Pipeline([
      ('bow', CountVectorizer(analyzer=text_process)),  # strings to token 
    integer counts
    ('tfidf', TfidfTransformer()),  # integer counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # train on TF-IDF vectors w/ Naive Bayes 
    classifier
    ])

    NB_Clasifier = pipeline.fit(msg_train,label_train)

Solution

One big clue is that it works in Jupyter notebook but not in Streamlit, which suggests there are differences in your working environments.

The error you're seeing emits from Pandas when a Series is not compared correctly. There is a very good explanation of this error on this stackoverflow answer.

But since your error is buried in sklearn (not your own code), chances are the problem you're having can be solved by matching the sklearn version that's being used in Jupyter to the version you have installed when you use Streamlit.

If you update your post with what versions of Pandas, SKlearn, and Python you are using in each case (Jupyter and Streamlit), it will be easier to help you figure this out.

It may also help to post the entire traceback (not just the top half) as plain text rather than a screenshot.

Thanks for trying out Streamlit!