Search code examples
python-3.xpandastokenword-cloudstop-words

plot Word cloud without stopwords


Iam looking to plot Wordcloud using a column in my pandas dataframe

here is my code:

all_words=''.join(  [tweet for tweet in tweet_table['tokens'] ] ) 

word_Cloud=WordCloud(width=500, height=300, random_state=21, max_font_size=119).generate(all_words)

plt.imshow(word_Cloud, interpolation='bilinear')

The column tweet_table['tokens'] that iam looking to plot looks like this:

0        [da, trumpanzee, follower, blm, balance, wp, g...
1        [counting, blacklivesmatter, received, trainin...
2        [okay, like, little, kids, pretty, smart, know...
3        [thank, oscopelabs, got, mounted, loud, amp, p...
4        [bpi, proud, supported, hoops, 4l, f, e, see, ...
                               ...                        
44713    [tomorrow, buy, charity, compilation, undergro...
44714    [needs, erected, state, capitol, think, darkfa...
44715    [clay, county, sheriffs, motto, screw, amp, in...
44716    [films, eleven, films, bravo, bad, ass, video,...
44717                       [everybody, give, listen, blm]

My code above gives me the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-227-4066d6d1a153> in <module>
      2 # REMOVE STOP WORDS
      3 
----> 4 all_words=''.join(  [tweet for tweet in tweet_table['tokens'] ] )


TypeError: sequence item 0: expected str instance, list found

How can i fix the error please? The column tweet_table['token'] is tokenized and clean from any stopwords

Many Thanks

Ps: when i use similar code for this column tweet_table['clean_text'] the code works fine.

The column tweet_table['clean_text'] looks like this:

0            You have a da trumpanzee follower in      ...
1          Over 279  and counting   If  BlackLivesMatte...
2        Okay but like little kids are pretty smart and...
3        Thank you oscopelabs  got it mounted loud  amp...
4        BPI is proud to have supported Hoops4L Y F E  ...
                               ...                        
44713    TOMORROW you can buy the   charity compilation...
44714        That needs to be erected at the State Capi...
44715      Clay County Sheriffs  Motto  To Screw  amp  ...
44716      Films Eleven Films bravo         Bad ass vid...
44717              everybody should give this a listen ...

Solution

  • I just got it fixed

    allwords=''.join( str(tweet_table['tokens']))
    
    word_Cloud=WordCloud(width=500, height=300, random_state=21,
                         max_font_size=119).generate(allwords)
    
    plt.imshow(word_Cloud, interpolation='bilinear')
    

    where tweet_table['tokens'] is free from any stopwords. Otherwise, we create a list of stopwords and add it as the code below

    from wordcloud import WordCloud,STOPWORDS
    
    stopwords_newlist = ["https", "co"] + list(STOPWORDS)
    
    allwords=''.join( str(tweet_table['tokens']))
    
    word_Cloud=WordCloud(width=500, height=300, random_state=21, stopwords=stopwords_newlist,
                         max_font_size=119).generate(allwords)
    
    
    plt.imshow(word_Cloud, interpolation='bilinear')