Search code examples
pythonpandascrosstab

run multiple cross tabulations with function in pandas


Hi I am trying to make some contingency tables. I want it in a function so I can use it for various columns/dataframes/combinations etc.

current I have a dataframe that looks like this

df = pd.DataFrame(data={'group' : ['A','A','B','B','C','D'],
                        'class': ['g1','g2','g2','g3','g1','g2'],
                        'total' : ['0-10','20-30','0-10','30-40','50-60','20-30'],
                        'sub' : ['1-4', '5-9','10-14', '15-19','1-4','15-19'],
                        'n': [3,14,12,11,21,9]})

and a function that looks like this

def cts(tabs, df):
    out=[]
    for col in df.loc[:,df.columns != tabs]:
        a = pd.crosstab([df[tabs]], df[col])
        out.append(a)
    return(out)
cts('group', df)

which works for cross tabulations for one column against the rest. But I want to add two (or more!) levels to the grouping e.g.

pd.crosstab([df['group'], df['class']], df['total'])

where total is cross tabulated against both group and class.

I think the 'tabs' var in the function should be a list of column names, but when i try and make it a list i get errors re invalid syntax. I hope this makes sense.. thank you!


Solution

  • Try:

    def cts(tabs, df):
        out=[]
        cols = [col for col in df.columns if col not in tabs]
        for col in df.loc[:,cols]:
            a = pd.crosstab([df[tab] for tab in tabs], df[col])
            out.append(a)
        return(out)