Search code examples
pandascrosstabcontingency

How to generate a contingency table with counts using pandas?


Let's say I have data like this:

+-------+--------+--------------+--------+--------------+
| index | used_x | clicked_in_x | used_y | clicked_in_y |
+-------+--------+--------------+--------+--------------+
|     1 | True   | False        | True   | True         |
|     2 | False  | False        | True   | False        |
|     3 | True   | True         | False  | False        |
+-------+--------+--------------+--------+--------------+

And I would like to generate a contingency table using using pandas that shows a table like:

+--------+----------------+----------------+
|        | clicked_from_x | clicked_from_y |
+--------+----------------+----------------+
| used_x |             40 |              3 |
| used_y |              2 |             10 |
+--------+----------------+----------------+

What would be the best way to achieve this? So far I've tried the following using the crosstab method:

import numpy as np
import pandas as pd

size = 20

df = pd.DataFrame({
    'used_x': np.random.choice(a=[False, True], size=size),
    'clicked_from_x': np.random.choice(a=[False, True], size=size),
    'used_y':  np.random.choice(a=[False, True], size=size),
    'clicked_from_y':  np.random.choice(a=[False, True], size=size),
})

pd.crosstab([df['used_x'], df['used_y']], [df['clicked_from_x'], df['clicked_from_y']],  margins=False)

Which yields:

enter image description here

But this is pretty hard to make sense of and not the representation I'd hoped for. Does anyone know how to get my desired outcome, or perhaps an equivalent strategy using pandas?


Solution

  • We'll be using the almighty dot product subroutine here.

    i = df.filter(like='clicked')
    j = df.filter(like='used')
    
    j.astype(int).T.dot(i)
    
            clicked_from_x  clicked_from_y
    used_x               6               5
    used_y               6               6