Search code examples
python-3.xpandasdataframenltkstop-words

I'm performing sentiment analysis on https://www.kaggle.com/snap/amazon-fine-food-reviews dataset


I want create the wordcloud for the most frequently used words.

import nltk 
from nltk.corpus import stopwords 
stopwords = set(STOPWORDS)
stopwords.update(["br", "href"])
textt = " ".join(review for review in df.Text)
wordcloud = WordCloud(stopwords=stopwords).generate(textt)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.savefig('wordcloud11.png')
plt.show() 

I tried getting it using this code but I'm getting an error NameError: name 'STOPWORDS' is not defined Can please anybody help me out with this.


Solution

  • You have not defined STOPWORDS and WordCloud. You need to import or define them first. You can use the ones defined in wordcloud package by importing them. Here is the complete code you will need. I have removed import nltk statement since you are not using it. Also I assume you already have a pandas dataframe df defined with a Text field.

    from wordcloud import WordCloud, STOPWORDS
    import matplotlib.pyplot as plt
    
    stopwords = set(STOPWORDS)
    stopwords.update(["br", "href"])
    textt = " ".join(review for review in df.Text)
    wordcloud = WordCloud(stopwords=stopwords).generate(textt)
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.savefig('wordcloud11.png')
    plt.show()