I have a Pandas Dataframe like this:
df =
a b
a1 b1
a1 b2
a1 b1
a1 Nan
a2 b1
a2 b2
a2 b2
a2 Nan
a2 b2
a3 Nan
For every value of a
, b
can have multiple values of b
corresponding to it. I want to fill up all the nan
values of b
with the mode of b
value grouped by the corresponding value of a
.
The resulting dataframe should look like the following:
df =
a b
a1 b1
a1 b2
a1 b1
a1 ***b1***
a2 b1
a2 b2
a2 b2
a2 **b2**
a2 b2
a3 b2
Above b1
was the mode of b
corresponding to a1
. Similarly, b2
was the mode corresponding to a2
. Finally, a3 had no data, so it fills it by global mode b2
.
For every nan value of column b, I want to fill it with the mode of the value of b column, but, for that particular value of a, whatever is the mode.
EDIT:
If there is a group a
for which there is no data on b
, then fill it by global mode.
Try:
# lazy grouping
groups = df.groupby('a')
# where all the rows within a group is NaN
all_na = groups['b'].transform(lambda x: x.isna().all())
# fill global mode
df.loc[all_na, 'b'] = df['b'].mode()[0]
# fill with local mode
mode_by_group = groups['b'].transform(lambda x: x.mode()[0])
df['b'] = df['b'].fillna(mod_by_group)