Search code examples
python-3.xpandasjoinspam-preventionword-cloud

Separate Spam and Ham for WordCloud Visualization


I am performing spam detection and want to visualize spam and ham keywords separately in Wordcloud. Here's my .csv file.

data = pd.read_csv("spam.csv",encoding='latin-1')
data = data.rename(columns = {"v1":"label", "v2":"message"})
data = data.replace({"spam":"1","ham":"0"})

data.head()

Here's my code for WordCloud. I need help with spam_words. I cannot generate the right graph.

import matplotlib.pyplot as plt
from wordcloud import WordCloud 

spam_words = ' '.join(list(data[data['label'] == 1 ]['message']))
spam_wc = WordCloud(width = 512, height = 512).generate(spam_words)

plt.figure(figsize = (10,8), facecolor = 'k')
plt.imshow(spam_wc)
plt.axis('off')
plt.tight_layout(pad = 0)
plt.show()

Solution

  • The issue is that the current code replaces "spam" and "ham" with the one-character strings "1" and "0", but you filter the DataFrame based on comparison with the integer 1. Change the replace line to this:

    data = data.replace({"spam": 1, "ham": 0})