I am performing spam detection and want to visualize spam and ham keywords separately in Wordcloud. Here's my .csv file.
data = pd.read_csv("spam.csv",encoding='latin-1')
data = data.rename(columns = {"v1":"label", "v2":"message"})
data = data.replace({"spam":"1","ham":"0"})
Here's my code for WordCloud. I need help with spam_words. I cannot generate the right graph.
import matplotlib.pyplot as plt
from wordcloud import WordCloud
spam_words = ' '.join(list(data[data['label'] == 1 ]['message']))
spam_wc = WordCloud(width = 512, height = 512).generate(spam_words)
plt.figure(figsize = (10,8), facecolor = 'k')
plt.imshow(spam_wc)
plt.axis('off')
plt.tight_layout(pad = 0)
plt.show()
The issue is that the current code replaces "spam"
and "ham"
with the one-character strings "1"
and "0"
, but you filter the DataFrame based on comparison with the integer 1. Change the replace line to this:
data = data.replace({"spam": 1, "ham": 0})