Search code examples
pythonword-cloudstop-words

Word Cloud showing several ' amongst words and not sure why


I tried to exclude them with a " ' " but that failed. Not sure where they are pulling from as they are not in the document. Thanks for any help

from wordcloud import WordCloud
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np 

url = 'https://raw.githubusercontent.com/Imme21/WordCloud/main/StockData3.csv'
df = pd.read_csv(url, error_bad_lines=False)
df.dropna(inplace = True)
text = df['Stock'].values

wordcloud = WordCloud(background_color = 'white',
            stopwords = ['Date','Stock', 'Tickers', 
                         'Open','Close', 'High', 
                         'Low', 'IV', 'under',
                         'over', 'price', 'change', 
                         '%', 'null']).generate(str(text))


plt.imshow(wordcloud) 
plt.axis("off")
plt.show()

enter image description here


Solution

  • The problem is related to how you obtain the string from the values in the dataframe column. Specifically, text = df['Stock'].values and .generate(str(text).

    Using pandas.Series.str.cat will produce the "correct" string and will give you the desired outcome:

    ...
    >>> text = df['Stock'].str.cat(sep=' ')
    ...
    >>> wordcloud = WordCloud(background_color = 'white',
                stopwords = ['Date','Stock', 'Tickers', 
                             'Open','Close', 'High', 
                             'Low', 'IV', 'under',
                             'over', 'price', 'change', 
                             '%', 'null']).generate(text)
    ...