Search code examples
pythonpandasdataframedata-structuresdata-analysis

How do I count categorical values from a dataset and transform the summation into a new dataset?


I have a dataset that tracks categorical (Yes/No) responses to a questionnaire formatted like this:

Client_Id Question_1 Question_2 Question_3 Question_4
1000 No No Yes Yes
1001 Yes Yes Yes No
1002 No Yes Yes Yes

I'm trying to count the number of Y or N responses to each question and transform them into a dataset formatted like this:

Response Question_1 Question_2 Question_3 Question_4
Yes 1 2 3 2
No 2 1 0 1

I've successfully counted the responses with this for loop:

for c in test_df.columns: d = test_df[c].value_counts().reindex(['Yes','No'], fill_value=0) print(d)

Now I just want to put the information into a new dataset.


Solution

  • One option using sum (only if you have a binary value like Yes/No):

    m = df.filter(like='Question').eq('Yes').sum()
    
    out = pd.DataFrame.from_dict({'Yes': m, 'No': len(df)-m}, orient='index')
    

    Or a more generic approach with melt and value_counts:

    out = (df.filter(like='Question')
             .melt().value_counts()
             .unstack('variable', fill_value=0)
             .rename_axis(index=None, columns=None)
          )
    

    Or crosstab:

    s = df.filter(like='Question').stack()
    
    out = (pd.crosstab(s, s.index.get_level_values(1))
             .rename_axis(index=None, columns=None)
           )
    

    Output:

         Question_1   Question_2   Question_3   Question_4
    Yes            1            2            3           2
    No             2            1            0           1