Search code examples
pythonpandasdataframepivot-tablecontingency

pandas crosstab for two columns


I am trying to make a contingency table using pd.crosstab from my local dataframe. Imagine we asked 3 people in 2 separate groups the question of whether they like ice cream or not, and here is the result in a dataframe:

group1    | group2    
------------------
yes       | no    
no        | maybe 
yes       | no    

And i would like the contingency table to look like this:

      | group1     | group2
----------------------------
yes   | 2          | 0
no    | 1          | 2
maybe | 0          | 1 

I have played around with pandas and evidently referenced many different resources, including the docs and other posts, but couldn't figure this out. Does anyone have any ideas? Thanks!


Solution

  • Pandas has a crosstab function that solve this; first you have to melt the dataframe:

    box = df.melt()
    pd.crosstab(box.value, box.variable)
    
    variable    group1  group2
    value       
    maybe          0    1
    no             1    2
    yes            2    0
    

    For performance, it is possible that groupby will be faster, even if it involves a few more steps:

    box.groupby(["variable", "value"]).size().unstack("variable", fill_value=0)