So there is CM_satisfaction column that contains different values like Fully Dissatisfied, Somewhat Satisfied, and Fully_Satisfied in pandas. I need how many voters where Fully_Satisfied. I have found out that using
Fully_Satisfied=survey[survey["CM_satisfaction"] == "Fully_Satisfied"]
Next, I need to find how many voters were Fully_Satisfied by each day of sample collection.I have also found out that
day-wise=only_date_and_cm.groupby("collection_date").count()
The result I got is
collection_date CM_satisfaction
2020-08-24 146
2020-08-25 95
2020-08-26 59
2020-08-27 54
2020-08-28 92
2020-08-29 73
2020-08-30 65
2020-08-31 62
2020-09-01 75
Now the main part For each day of sample collection, determine the proportion of respondents who were fully satisfied with the performance of the CM. So if there were a total of 1000 samples on day 1 and 300 out of those said they were fully satisfied, then our answer for that day would be 0.3.
I have also found total votes on each day
day_wise_vote=survey.collection_date.value_counts()
day_wise_vote
2020-08-24 1479
2020-08-25 998
2020-08-28 761
2020-08-27 665
2020-08-29 620
2020-09-01 607
2020-08-26 598
2020-08-30 582
2020-08-31 557
Now how do I find the proportion of respondents who were fully satisfied with the performance of the CM day wise?
I think you can simply do:
day_wise_proportion = day_wise.astype(float) / day_wise_vote
The idea is that when you divide a pandas Series by another (of the same length and index), python will divide the values row by row.
It also looks like you create many intermediate dataframes. You could do somethink like that, which will be faster and clearer :
survey['Fully_Satisfied'] = (survey["CM_satisfaction"] == "Fully_Satisfied").astype(float)
day_wise = survey.groupby("collection_date").agg({'Fully_Satisfied': 'sum',
'CM_satisfaction': 'count'})
day_wise.rename(columns={'CM_satisfaction': 'vote'}, inplace=True)
day_wise['proportion'] = day_wise['Fully_Satisfied'] / day_wise['vote']