Search code examples
python-3.xpandasfillna

How to fill na in pandas by the mode of a group


I have a Pandas Dataframe like this:

  df = 

       a                    b
       a1                   b1
       a1                   b2
       a1                   b1
       a1                   Nan
       a2                   b1
       a2                   b2
       a2                   b2
       a2                   Nan
       a2                   b2
       a3                   Nan

For every value of a, b can have multiple values of b corresponding to it. I want to fill up all the nan values of b with the mode of b value grouped by the corresponding value of a.

The resulting dataframe should look like the following:

  df = 

       a                    b
       a1                   b1
       a1                   b2
       a1                   b1
       a1                   ***b1***
       a2                   b1
       a2                   b2
       a2                   b2
       a2                   **b2**
       a2                   b2
       a3                   b2

Above b1 was the mode of b corresponding to a1. Similarly, b2 was the mode corresponding to a2. Finally, a3 had no data, so it fills it by global mode b2.

For every nan value of column b, I want to fill it with the mode of the value of b column, but, for that particular value of a, whatever is the mode.

EDIT:

If there is a group a for which there is no data on b, then fill it by global mode.


Solution

  • Try:

    # lazy grouping
    groups = df.groupby('a')
    
    # where all the rows within a group is NaN
    all_na = groups['b'].transform(lambda x: x.isna().all())
    
    # fill global mode
    df.loc[all_na, 'b'] = df['b'].mode()[0]
    
    # fill with local mode
    mode_by_group = groups['b'].transform(lambda x: x.mode()[0])
    df['b'] = df['b'].fillna(mod_by_group)