I am trying to fit my model on Streamlit.io app, but I am getting the above Value-Error. But it doesn't give the same error on Jupyter Notebook Please any better approach will help a lot.
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
File "c:\users\8470p\anaconda3\lib\site-packages\streamlit\ScriptRunner.py", line 311, in _run_script exec(code, module.__dict__)
File "C:\Users\8470p\app2.py", line 122, in bow_transformer = CountVectorizer(analyzer=text_process).fit(messages['message'])
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1024, in fit self.fit_transform(raw_documents)
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1058, in fit_transform self.fixed_vocabulary_)
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 962, in _count_vocab analyze = self.build_analyzer()
File "c:\users\8470p\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 339, in build_analyzer if self.analyzer == 'char':
File "c:\users\8470p\anaconda3\lib\site-packages\pandas\core\generic.py", line 1555, in __nonzero__ self.__class__.__name__
enter code here
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
bow_transformer =
CountVectorizer(analyzer=text_process).fit(messages['message'])
msg_train, msg_test, label_train, label_test =
train_test_split(messages['message'], messages['label'], test_size=0.2)
pipeline = Pipeline([
('bow', CountVectorizer(analyzer=text_process)), # strings to token
integer counts
('tfidf', TfidfTransformer()), # integer counts to weighted TF-IDF scores
('classifier', MultinomialNB()), # train on TF-IDF vectors w/ Naive Bayes
classifier
])
NB_Clasifier = pipeline.fit(msg_train,label_train)
One big clue is that it works in Jupyter notebook but not in Streamlit, which suggests there are differences in your working environments.
The error you're seeing emits from Pandas when a Series is not compared correctly. There is a very good explanation of this error on this stackoverflow answer.
But since your error is buried in sklearn (not your own code), chances are the problem you're having can be solved by matching the sklearn version that's being used in Jupyter to the version you have installed when you use Streamlit.
If you update your post with what versions of Pandas, SKlearn, and Python you are using in each case (Jupyter and Streamlit), it will be easier to help you figure this out.
It may also help to post the entire traceback (not just the top half) as plain text rather than a screenshot.
Thanks for trying out Streamlit!