I am trying to vectorize a sentiment data set. It has review text and sentimentlabel given. When I try to vectorize the data set It gives an error called 'LazyCorpusLoader' object is not iterable
The reviews were cleaned as follows.
After these my dataframe reviewdataset_df has following columns:
then I split the data set using below code,
#splitting data set into training and testing
X_train,X_test,Y_train,Y_test =train_test_split(reviewDataset_Df.head(10000).review_clean,reviewDataset_Df.head(10000).SENTIMENT,test_size=0.20,random_state=0,shuffle=True)
print('Training data count:'+str(len(X_train)))
print('Test data count:'+str(len(X_test)))
That worked well.
Then I use vectorizer using following code.
#vectorizer
tfidf=TfidfVectorizer(sublinear_tf=True,min_df=3,stop_words=english,norm='l2',encoding='utf-8',ngram_range=(1,3))
print("rr")
train_features=tfidf.fit_transform(X_train)
test_features=tfidf.transform(X_test)
train_labels=Y_train
test_labels=Y_test
This gives an error as return frozenset(stop) TypeError: 'LazyCorpusLoader' object is not iterable
I searched and tried on some solutions which didn't worked. How to overcome this error. I need to vectorize the data set to train for a recommendation system.
note: I searched through internet and read similar question in stackoverflow but couldn't find a proper answer.
Without a proper error trace we can only guess.
Since the error involves stop
my guess is that your variable english
- that isn't in the code you shared at all - is inappropriately set up, and not a set of words.
You probably meant to use stop_words="english"
instead.