I have DataFrame in Python Pandas like below:
Data type:
ID | U1 | U2 | U3 | CP |
---|---|---|---|---|
111 | 1 | 1 | 0 | 10-20 |
222 | 1 | 0 | 1 | 10-20 |
333 | 0 | 1 | 0 | 20-30 |
444 | 0 | 1 | 1 | 40-50 |
555 | 1 | 0 | 0 | 10-20 |
And I need to aggregate above DataFrame using pivot table to have something like below:
COUNT_CP - How many ID have defined in index combination U and CP columns (for example combination U1 -'1' and CP_10_20 have 3 IDs)
COUNT_U - How many ID have '1' in each 'U' column (for example 3 IDs have '1' in column 'U1')
idx - indexes columns
idx | idx | COUNT_CP | COUNT_U |
---|---|---|---|
U1 | CP_10_20 | 3 | 3 |
CP_20_30 | 0 | 3 | |
CP_30_40 | 0 | 3 | |
CP_40_50 | 0 | 3 | |
U2 | CP_10_20 | 2 | 3 |
CP_20_30 | 0 | 3 | |
CP_30_40 | 0 | 3 | |
CP_40_50 | 1 | 3 | |
U3 | CP_10_20 | 1 | 2 |
CP_20_30 | 0 | 2 | |
CP_30_40 | 0 | 2 | |
CP_40_50 | 1 | 2 |
How can I do that in Python Pandas ?
You could do this using melt
and groupby
import pandas as pd
df = pd.DataFrame({'ID': {0: 111, 1: 222, 2: 333, 3: 444, 4: 555},
'U1': {0: 1, 1: 1, 2: 0, 3: 0, 4: 1},
'U2': {0: 1, 1: 0, 2: 1, 3: 1, 4: 0},
'U3': {0: 0, 1: 1, 2: 0, 3: 1, 4: 0},
'CP': {0: '10-20', 1: '10-20', 2: '20-30', 3: '40-50', 4: '10-20'}})
df = df.drop(columns='ID').melt(id_vars='CP', var_name='idx', value_name='COUNT_CP')
df = df.groupby(['idx','CP'])['COUNT_CP'].sum().reset_index()
df['COUNT_U'] = df.groupby('idx')['COUNT_CP'].transform(sum)
df['CP'] = df['CP'].apply(lambda x: 'CP_' + '_'.join(x.split('-')))
print(df)
Output
idx CP COUNT_CP COUNT_U
0 U1 CP_10_20 3 3
1 U1 CP_20_30 0 3
2 U1 CP_40_50 0 3
3 U2 CP_10_20 1 3
4 U2 CP_20_30 1 3
5 U2 CP_40_50 1 3
6 U3 CP_10_20 1 2
7 U3 CP_20_30 0 2
8 U3 CP_40_50 1 2