Search code examples
pythonpandaspivot-table

How to make a pandas crosstab with percentages?


Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?

df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
                   'B' : ['A', 'B', 'C'] * 8,
                   'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
                   'D' : np.random.randn(24),
                   'E' : np.random.randn(24)})


pd.crosstab(df.A,df.B)


B       A    B    C
A               
one     4    4    4
three   2    2    2
two     2    2    2

Expected output:

B       A     B    C
A               
one     .33  .33  .33
three   .33  .33  .33
two     .33  .33  .33

Solution

  • pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=1)
    

    Basically you just have the function that does row/row.sum(), and you use apply with axis=1 to apply it by row.

    (If doing this in Python 2, you should use from __future__ import division to make sure division always returns a float.)