Let's say this is the data
id1 | id2 | type | month.measure |
---|---|---|---|
105 | 50 | growing | 04-2020 |
105 | 50 | advancing | 04-2020 |
44 | 29 | advancing | 04-2020 |
105 | 50 | retreating | 05-2020 |
105 | 50 | shrinking | 05-2020 |
It would have to end like this
id1 | id2 | growing | shrinking | advancing | retreating | month.measure |
---|---|---|---|---|---|---|
105 | 50 | 1 | 0 | 1 | 0 | 04-2020 |
44 | 29 | 0 | 0 | 1 | 0 | 04-2020 |
105 | 50 | 0 | 1 | 0 | 1 | 05-2020 |
I've been trying with transformations of all kinds, lambda functions, pandas get_dummies
and trying to aggregate them grouped by the 2 ids and the date but I couldn't find a way.
Hope we can sort it out! Thanks in advance! :)
This solution uses pandas get_dummies
to one-hot encode the "TYPE" column, then concatenates the one-hot encoded dataframe back with the original, followed by a groupby applied to the ID columns and "MONTH":
# Set up the dataframe
ID1 = [105,105,44,105,105]
ID2 = [50,50,29,50,50]
TYPE = ['growing','advancing','advancing','retreating','shrinking']
MONTH = ['04-2020','04-2020','04-2020','05-2020','05-2020']
df = pd.DataFrame({'ID1':ID1,'ID2':ID2, 'TYPE':TYPE, 'MONTH.MEASURE':MONTH})
# Apply get_dummies and groupby operations
df = pd.concat([df.drop('TYPE',axis=1),pd.get_dummies(df['TYPE'])],axis=1)\
.groupby(['ID1','ID2','MONTH.MEASURE']).sum().reset_index()
# These bits are just cosmetic to get the output to look more like your required output
df.columns = [c.upper() for c in df.columns]
col_order = ['GROWING','SHRINKING','ADVANCING','RETREATING','MONTH.MEASURE']
df[['ID1','ID2']+col_order]
# ID1 ID2 GROWING SHRINKING ADVANCING RETREATING MONTH.MEASURE
# 0 44 29 0 0 1 0 04-2020
# 1 105 50 1 0 1 0 04-2020
# 2 105 50 0 1 0 1 05-2020