I have a dataset that tracks categorical (Yes/No) responses to a questionnaire formatted like this:
Client_Id | Question_1 | Question_2 | Question_3 | Question_4 |
---|---|---|---|---|
1000 | No | No | Yes | Yes |
1001 | Yes | Yes | Yes | No |
1002 | No | Yes | Yes | Yes |
I'm trying to count the number of Y or N responses to each question and transform them into a dataset formatted like this:
Response | Question_1 | Question_2 | Question_3 | Question_4 |
---|---|---|---|---|
Yes | 1 | 2 | 3 | 2 |
No | 2 | 1 | 0 | 1 |
I've successfully counted the responses with this for loop:
for c in test_df.columns: d = test_df[c].value_counts().reindex(['Yes','No'], fill_value=0) print(d)
Now I just want to put the information into a new dataset.
One option using sum
(only if you have a binary value like Yes/No):
m = df.filter(like='Question').eq('Yes').sum()
out = pd.DataFrame.from_dict({'Yes': m, 'No': len(df)-m}, orient='index')
Or a more generic approach with melt
and value_counts
:
out = (df.filter(like='Question')
.melt().value_counts()
.unstack('variable', fill_value=0)
.rename_axis(index=None, columns=None)
)
Or crosstab
:
s = df.filter(like='Question').stack()
out = (pd.crosstab(s, s.index.get_level_values(1))
.rename_axis(index=None, columns=None)
)
Output:
Question_1 Question_2 Question_3 Question_4
Yes 1 2 3 2
No 2 1 0 1