Search code examples
pandasdataframegroup-bysumfillna

fill nan with string before groupby pandas


I have a table like this

src | dst | amount
------------------
a   | c   | 100
b   | c   | 50
a   | b   | 40
a   | nan | 10
nan | nan | 100

and i want a table like this, where I group by src and sum the amount sent, and if the src is nan is want to replace with "unknown".

entity | sent 
------------- 
a      | 150    
b      | 50    
uknown | 100

Solution

  • If nan is missing value replace it before aggregation:

    df = (df.fillna({'src': 'unknown'})
            .groupby('src')['amount']
            .sum()
            .rename_axis('entity')
            .reset_index(name='sent'))
    print (df)
        entity  sent
    0        a   150
    1        b    50
    2  unknown   100
    

    Or if 'nan' is string use:

    df = (df.replace({'src': {'nan': 'unknown'}})
            .groupby('src')['amount']
            .sum()
            .rename_axis('entity')
            .reset_index(name='sent'))
    print (df)
        entity  sent
    0        a   150
    1        b    50
    2  unknown   100
    

    Another ideas:

    df.groupby(df['src'].fillna('unknown').rename('entity'))['sent'].sum().reset_index()
    

    df.groupby(df['src'].replace('nan','unknown').rename('entity'))['sent'].sum().reset_index()