I would like to collapse a dataset using groupby and agg and I want to preserve the NaN values as separate rows without collapsing. For example, the initial data is:
mydata = {'category' : [np.NaN, np.NaN, "a", "b", "b", "c"],'category2' : ["f", "f", "r", "h", "h", "j"], 'value' : ['12', '11', '10', '13', '15', '11']}
df = pd.DataFrame(mydata, columns=['category','category2','value'])
df=df.groupby(['category','category2'], dropna=False
).agg({'value':'max'})
category category2 value
0 NaN f 12
1 NaN f 11
2 a r 10
3 b h 13
4 b h 15
5 c j 11
Current output:
category category2 value
0 NaN f 12
1 a r 10
2 b h 15
3 c j 11
Desired output:
category category2 value
0 NaN f 12
1 NaN f 11
2 a r 10
3 b h 15
4 c j 11
Is there a way to do this with group_by
, other than splitting the dataframe to two NaN and non-NaN dataframes?
Concat rows of both cases (with NaN
values and aggregated without NaN
):
df = pd.concat([df[df['category'].isna()],
df.groupby(['category','category2'])
.agg({'value':'max'}).reset_index()])
category category2 value
0 NaN f 12
1 NaN f 11
0 a r 10
1 b h 15
2 c j 11