Search code examples
pythonpandasdataframecrosstab

How to crosstab columns from dataframe based on a condition?


I often need cross tables for pre-analysis of my data. I can produce a basic cross table with pd.crosstab(df['column'], df['column']) but fail to add a crition (logical expression), to filter this cross table only to a subset of my dataframe.

I've tried pd.crosstab(df['health'], df['money']) if df['year']==1988 and several postions for the if. I hope it's easy to solve, but I'm relatively new to Python and Pandas.

import pandas as pd
df = pd.DataFrame({'year': ['1988', '1988', '1988', '1988', '1989', '1989', '1989', '1989'],
                   'health': ['2', '2', '3', '1', '3', '5', '2', '1'],
                   'money': ['5', '7', '8', '8', '3', '3', '7', '8']}).astype(int)

# cross table for 1988 and 1999
pd.crosstab(df['health'], df['money'])

Solution

  • Filter by boolean indexing before crosstab:

    df1 = df[df['year']==1988]
    df2 = pd.crosstab(df1['health'], df1['money'])
    

    EDIT: You can filter each column separately:

    mask = df['year']==1988
    df2 = pd.crosstab(df.loc[mask, 'health'], df.loc[mask, 'money'])