I have a dataframe and dictionary like this
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, 1, 1, 2, 2, 3, 3, 3, 3],
'ignore_me': range(9),
'fill_me': [np.nan] * 9
})
di = {
1: ['a', 'b'],
2: ['c', 'd'],
3: ['e', 'f', 'g']
}
A ignore_me fill_me
0 1 0 NaN
1 1 1 NaN
2 1 2 NaN
3 2 3 NaN
4 2 4 NaN
5 3 5 NaN
6 3 6 NaN
7 3 7 NaN
8 3 8 NaN
The entries in A
of df
correspond to the keys in di
. I would now like to fill the column fill_me
using the values of di
, so my desired outcome looks like this:
A ignore_me fill_me
0 1 0 a
1 1 1 b
2 1 2 NaN
3 2 3 c
4 2 4 d
5 3 5 e
6 3 6 f
7 3 7 g
8 3 8 NaN
One way of achieving this is as follows:
df_list = []
for key, values in di.items():
temp_df = df[df['A'] == key].reset_index(drop=True)
fill_df = pd.DataFrame({'A': [key]* len(values), 'fill_me': values})
df_list.append(temp_df.combine_first(fill_df))
final_df = pd.concat(df_list, ignore_index=True)
which gives me the desired outcome. However, it requires looping, a concat
and also creates a new dataframe. Does anyone see a more straightforward way of implementing this? Ideally I could "just" fill df
using a smart way of using fillna
or map
.
You can do cumcount
create the key
s = pd.Series(di).explode().reset_index()
s.columns = ['A','fill']
df['key'] = df.groupby('A').cumcount()
s['key'] = s.groupby('A').cumcount()
df.merge(s,how='left')
Out[463]:
A ignore_me fill_me key fill
0 1 0 NaN 0 a
1 1 1 NaN 1 b
2 1 2 NaN 2 NaN
3 2 3 NaN 0 c
4 2 4 NaN 1 d
5 3 5 NaN 0 e
6 3 6 NaN 1 f
7 3 7 NaN 2 g
8 3 8 NaN 3 NaN