Search code examples
pythonpandaspivot-tableaggregate-functions

How to create pivot table based on binary and categorical columns in DataFrame in Python Pandas?


I have DataFrame in Python Pandas like below:

Data type:

  • ID - numeric
  • U1, U2, U3 - numeric (binary 0/1)
  • CP - string (range)
ID U1 U2 U3 CP
111 1 1 0 10-20
222 1 0 1 10-20
333 0 1 0 20-30
444 0 1 1 40-50
555 1 0 0 10-20

And I need to aggregate above DataFrame using pivot table to have something like below:

  • COUNT_CP - How many ID have defined in index combination U and CP columns (for example combination U1 -'1' and CP_10_20 have 3 IDs)

  • COUNT_U - How many ID have '1' in each 'U' column (for example 3 IDs have '1' in column 'U1')

  • idx - indexes columns

    idx idx COUNT_CP COUNT_U
    U1 CP_10_20 3 3
    CP_20_30 0 3
    CP_30_40 0 3
    CP_40_50 0 3
    U2 CP_10_20 2 3
    CP_20_30 0 3
    CP_30_40 0 3
    CP_40_50 1 3
    U3 CP_10_20 1 2
    CP_20_30 0 2
    CP_30_40 0 2
    CP_40_50 1 2

How can I do that in Python Pandas ?


Solution

  • You could do this using melt and groupby

    import pandas as pd
    df = pd.DataFrame({'ID': {0: 111, 1: 222, 2: 333, 3: 444, 4: 555},
     'U1': {0: 1, 1: 1, 2: 0, 3: 0, 4: 1},
     'U2': {0: 1, 1: 0, 2: 1, 3: 1, 4: 0},
     'U3': {0: 0, 1: 1, 2: 0, 3: 1, 4: 0},
     'CP': {0: '10-20', 1: '10-20', 2: '20-30', 3: '40-50', 4: '10-20'}})
    
    df = df.drop(columns='ID').melt(id_vars='CP', var_name='idx', value_name='COUNT_CP')
    df = df.groupby(['idx','CP'])['COUNT_CP'].sum().reset_index()
    df['COUNT_U'] = df.groupby('idx')['COUNT_CP'].transform(sum)
    df['CP'] = df['CP'].apply(lambda x: 'CP_' + '_'.join(x.split('-')))
    
    print(df)
    

    Output

      idx        CP  COUNT_CP  COUNT_U
    0  U1  CP_10_20         3        3
    1  U1  CP_20_30         0        3
    2  U1  CP_40_50         0        3
    3  U2  CP_10_20         1        3
    4  U2  CP_20_30         1        3
    5  U2  CP_40_50         1        3
    6  U3  CP_10_20         1        2
    7  U3  CP_20_30         0        2
    8  U3  CP_40_50         1        2