I have a data that initially looks like this:
column1 | Time |
---|---|
yes | 271-273 |
no | 271-273 |
neutral | 271-273 |
no | 274-276 |
... | ... |
I want the percentage of yes, percentage of no, and percentage of neutral across each Time range. I was able to get the count of each one of these categories (yes,no,neutral) across every time range using the following code:
df['COUNTER'] =1
group_data = df.groupby(['Time','column1'])['COUNTER'].sum()
I am not sure how to calculate the percentage of counts from this.
Use SeriesGroupBy.value_counts
with parameter normalize=True
:
print (df)
column1 Time
0 yes 271-273
1 no 271-273
2 neutral 271-273
3 no 271-273 <- changed data for better sample
group_data = (df.groupby(['Time'])['column1']
.value_counts(normalize=True)
.reset_index(name='%') )
print (group_data)
Time column1 %
0 271-273 no 0.50
1 271-273 neutral 0.25
2 271-273 yes 0.25
Another idea with divide by sum of counts with DataFrameGroupBy.size
and Series.div
:
s = df.groupby(['Time','column1']).size()
group_data = s.div(s.groupby(level=0).sum()).reset_index(name='%')
print (group_data)
Time column1 %
0 271-273 neutral 0.25
1 271-273 no 0.50
2 271-273 yes 0.25