Search code examples
pythonpandasdataframedata-analysis

So there is CM_satisfaction coulmn that contains different values like Fully Dissatisfied,Somewhat Satisfied and Fully_Satisfied in pandas


So there is CM_satisfaction column that contains different values like Fully Dissatisfied, Somewhat Satisfied, and Fully_Satisfied in pandas. I need how many voters where Fully_Satisfied. I have found out that using

Fully_Satisfied=survey[survey["CM_satisfaction"] == "Fully_Satisfied"]

Next, I need to find how many voters were Fully_Satisfied by each day of sample collection.I have also found out that

day-wise=only_date_and_cm.groupby("collection_date").count()

The result I got is

collection_date CM_satisfaction

2020-08-24 146

2020-08-25 95

2020-08-26 59

2020-08-27 54

2020-08-28 92

2020-08-29 73

2020-08-30 65

2020-08-31 62

2020-09-01 75

Now the main part For each day of sample collection, determine the proportion of respondents who were fully satisfied with the performance of the CM. So if there were a total of 1000 samples on day 1 and 300 out of those said they were fully satisfied, then our answer for that day would be 0.3.

I have also found total votes on each day

day_wise_vote=survey.collection_date.value_counts()
day_wise_vote

2020-08-24 1479

2020-08-25 998

2020-08-28 761

2020-08-27 665

2020-08-29 620

2020-09-01 607

2020-08-26 598

2020-08-30 582

2020-08-31 557

Now how do I find the proportion of respondents who were fully satisfied with the performance of the CM day wise?


Solution

  • I think you can simply do:

    day_wise_proportion = day_wise.astype(float) / day_wise_vote
    

    The idea is that when you divide a pandas Series by another (of the same length and index), python will divide the values row by row.


    It also looks like you create many intermediate dataframes. You could do somethink like that, which will be faster and clearer :

    survey['Fully_Satisfied'] = (survey["CM_satisfaction"] == "Fully_Satisfied").astype(float)
    day_wise = survey.groupby("collection_date").agg({'Fully_Satisfied': 'sum',
                                                      'CM_satisfaction': 'count'})
    day_wise.rename(columns={'CM_satisfaction': 'vote'}, inplace=True)
    
    day_wise['proportion'] = day_wise['Fully_Satisfied'] / day_wise['vote']