There is a DataFrame with some NaN values:
df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 2, 2], 'B': [1, 1, np.NaN, 2, 3, np.NaN, 3, 4]})
A B
0 1 1.0
1 1 1.0
2 1 NaN <-
3 1 2.0
4 2 3.0
5 2 NaN <-
6 2 3.0
7 2 4.0
Set label 'A' as an index:
df.set_index(['A'], inplace=True)
Now there are two groups with the indices 1 and 2:
B
A
1 1.0
1 1.0
1 NaN <-
1 2.0
2 3.0
2 NaN <-
2 3.0
2 4.0
What is the best way to do fillna() on the DataFrame with the most frequent value from each group?
So, I would like to do a call of something like this:
df.B.fillna(df.groupby('A').B...)
and get:
B
A
1 1.0
1 1.0
1 1.0 <-
1 2.0
2 3.0
2 3.0 <-
2 3.0
2 4.0
I hope there's a way and it also works with multiindex.
A
and apply fillna()
to B within each group;value_counts
, use idxmax()
to pick up the most frequent value;Assuming there are no groups where all values are missing:
df['B'] = df.groupby('A')['B'].transform(lambda x: x.fillna(x.dropna().value_counts().idxmax()))
df