Search code examples
pythonmatplotlibdata-analysisstop-wordsword-cloud

How to add extra stop words in addition to default stopwords in wordcloud?


I would like to add certain words to the default stopwords list used in wordcloud. Current code:

all_text = " ".join(rev for rev in twitter_clean.text)
stop_words = ["https", "co", "RT"]
wordcloud = WordCloud(stopwords = stop_words, background_color="white").generate(all_text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

When I use the custom stop_words variable, words such as "is", "was" , and "the" are all interpreted and displayed as high frequency words. However, when I use the default stopwords list (no stopwords argument) then there are many other words that are displayed as highly frequent. How do I add my custom stop_words variable along with the default stopwords list to my wordcloud?


Solution

  • Just append your list to the built-in STOPWORDS list:

    From the wordcloud documentation:

    stopwords : set of strings or None. The words that will be eliminated. If None, the build-in STOPWORDS list will be used.

    So you can simply append STOPWORDS to your custom list and use it

    all_text = " ".join(rev for rev in twitter_clean.text)
    stop_words = ["https", "co", "RT"] + list(STOPWORDS)
    wordcloud = WordCloud(stopwords = stop_words, background_color="white").generate(all_text)
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.show()