Search code examples
pythonpandaspivot-table

How to combine multiple crosstabs while preserving same binary column


I would like stack 2 crosstabs for 2 categorical variables (gender, race) while preserving the same binary variable (goal) as columns.

Take for example this dataset:

df = pd.DataFrame({'GenderDSC':['Male','Female','Female','Male','Male','Male'],
             'Goal':[0,1,0,1,0,0],
             'Race':['African-American','White','White','Asian','Asian','White']})
df

Where crosstabs would output:

enter image description here enter image description here

But would like them to be merged like:

enter image description here

So far, I have tried:

a = pd.crosstab(df['GenderDSC'],df['Goal'])
b = pd.crosstab(df['Race'],df['Goal'])
pd.concat([a,b])

enter image description here

But I lose the groups for each columns (genderdsc, race), so thinking there could be a better way to connect them.


Solution

  • Use the argument keys from pd.concat

    >>> pd.concat([a,b], keys=('GenderDSC', 'Race'))
    

    From the docs,

    If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level.


    Goal                        0  1
    GenderDSC Female            1  1
              Male              3  1
    Race      African-American  1  0
              Asian             1  1
              White             2  1