Tried to follow tutorials and answers on here but couldnt wrap my head around creating a pie chart based on my data from a csv. sample of my csv below
post_id post_title subreddit polarity subjectivity sentiment
0 bo7h4z ['league'] soccer -0.2 0.4 negative
1 bnvieg ['césar'] soccer 0 0 neutral
2 bnup5q ['foul'] soccer 0.1 0.6 positive
3 bnul4u ['benfica'] soccer 0.45 0.5 positive
4 bnthuf ['prediction'] soccer 0 0 neutral
5 bnolhc ['revolution' ] soccer 0 0 neutral
There are many more rows but I need to plot the sentiment column, basically how many rows are positive, neutral or negative
outfile = open("clean_soccer.csv","r", encoding='utf-8')
file=csv.reader(outfile)
next(file, None)
post_id = []
post_title = []
subreddit = []
polarity =[]
subjectivity = []
sentiment = []
for row in file:
post_id.append(row[0])
post_title.append(row[1])
subreddit.append(row[2])
polarity.append(row[3])
subjectivity.append(row[4])
sentiment.append(row[5])
plt.pie( , labels=)
plt.axis('equal')
plt.show()
Would it be something similar to this?
I will provide a brief answer by only reading in the sentiment
column. You need split
to access the sentiment column using the index [5]
. Then, you can use Counter
to compute the frequency and then use the values to plot the percentage in pie chart.
import csv
from collections import Counter
outfile = open("clean_soccer.csv","r", encoding='utf-8')
file=csv.reader(outfile)
next(file, None)
sentiment = []
for row in file:
sentiment.append(row[0].split()[5])
counts = Counter(sentiment[:-1])
plt.pie(counts.values(), labels=counts.keys(), autopct='%1.1f%%',)
plt.axis('equal')
plt.show()
EDIT: Answering your second question in the comments below
df['sentiment'].value_counts().plot.pie(autopct='%1.1f%%',)
plt.axis('equal')
plt.show()