I am trying to make a contingency table using pd.crosstab
from my local dataframe. Imagine we asked 3 people in 2 separate groups the question of whether they like ice cream or not, and here is the result in a dataframe:
group1 | group2
------------------
yes | no
no | maybe
yes | no
And i would like the contingency table to look like this:
| group1 | group2
----------------------------
yes | 2 | 0
no | 1 | 2
maybe | 0 | 1
I have played around with pandas and evidently referenced many different resources, including the docs and other posts, but couldn't figure this out. Does anyone have any ideas? Thanks!
Pandas has a crosstab function that solve this; first you have to melt the dataframe:
box = df.melt()
pd.crosstab(box.value, box.variable)
variable group1 group2
value
maybe 0 1
no 1 2
yes 2 0
For performance, it is possible that groupby will be faster, even if it involves a few more steps:
box.groupby(["variable", "value"]).size().unstack("variable", fill_value=0)