Search code examples
pandasgroup-by

Ignoring the NaN rows when collapsing data by group_by in Pandas


I would like to collapse a dataset using groupby and agg and I want to preserve the NaN values as separate rows without collapsing. For example, the initial data is:

  mydata = {'category' : [np.NaN, np.NaN, "a", "b", "b", "c"],'category2' : ["f", "f", "r", "h", "h", "j"], 'value' : ['12', '11', '10', '13', '15', '11']}
df = pd.DataFrame(mydata, columns=['category','category2','value'])

df=df.groupby(['category','category2'],  dropna=False 
).agg({'value':'max'})

    category    category2   value
0   NaN              f       12
1   NaN              f       11
2   a                r       10
3   b                h       13
4   b                h       15
5   c                j       11

Current output:

  category  category2   value
0   NaN        f          12
1   a          r          10
2   b          h          15
3   c          j          11

Desired output:

  category  category2   value
0   NaN        f          12
1   NaN        f          11
2   a          r          10
3   b          h          15
4   c          j          11

Is there a way to do this with group_by, other than splitting the dataframe to two NaN and non-NaN dataframes?


Solution

  • Concat rows of both cases (with NaN values and aggregated without NaN):

    df = pd.concat([df[df['category'].isna()],
                    df.groupby(['category','category2'])
                   .agg({'value':'max'}).reset_index()])
    

      category category2 value
    0      NaN         f    12
    1      NaN         f    11
    0        a         r    10
    1        b         h    15
    2        c         j    11