python pandas dataframe fill missing-data

pandas: What is the best way to do fillna() on a (multiindexed) DataFrame with the most frequent value from every group?

There is a DataFrame with some NaN values:

df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 2, 2], 'B': [1, 1, np.NaN, 2, 3, np.NaN, 3, 4]})

   A    B
0  1  1.0
1  1  1.0
2  1  NaN <-
3  1  2.0
4  2  3.0
5  2  NaN <-
6  2  3.0
7  2  4.0

Set label 'A' as an index:

df.set_index(['A'], inplace=True)

Now there are two groups with the indices 1 and 2:

     B
A     
1  1.0
1  1.0
1  NaN <-
1  2.0
2  3.0
2  NaN <-
2  3.0
2  4.0

What is the best way to do fillna() on the DataFrame with the most frequent value from each group?

So, I would like to do a call of something like this:

df.B.fillna(df.groupby('A').B...)

and get:

I hope there's a way and it also works with multiindex.

Solution

groupby column A and apply fillna() to B within each group;
drop missing values from the series, and do value_counts, use idxmax() to pick up the most frequent value;

Assuming there are no groups where all values are missing:

df['B'] = df.groupby('A')['B'].transform(lambda x: x.fillna(x.dropna().value_counts().idxmax()))
df