Search code examples
pythonpandasdataframeperformancefillna

How to efficiently fill a column of a dataframe based on a dictionary


I have a dataframe and dictionary like this

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 1, 1, 2, 2, 3, 3, 3, 3],
    'ignore_me': range(9),
    'fill_me': [np.nan] * 9
})

di = {
    1: ['a', 'b'],
    2: ['c', 'd'],
    3: ['e', 'f', 'g']
}

   A  ignore_me  fill_me
0  1          0      NaN
1  1          1      NaN
2  1          2      NaN
3  2          3      NaN
4  2          4      NaN
5  3          5      NaN
6  3          6      NaN
7  3          7      NaN
8  3          8      NaN

The entries in A of df correspond to the keys in di. I would now like to fill the column fill_me using the values of di, so my desired outcome looks like this:

   A  ignore_me fill_me
0  1          0       a
1  1          1       b
2  1          2     NaN
3  2          3       c
4  2          4       d
5  3          5       e
6  3          6       f
7  3          7       g
8  3          8     NaN

One way of achieving this is as follows:

df_list = []
for key, values in di.items():
    temp_df = df[df['A'] == key].reset_index(drop=True)
    fill_df = pd.DataFrame({'A': [key]* len(values), 'fill_me': values})
    df_list.append(temp_df.combine_first(fill_df))

final_df = pd.concat(df_list, ignore_index=True)

which gives me the desired outcome. However, it requires looping, a concat and also creates a new dataframe. Does anyone see a more straightforward way of implementing this? Ideally I could "just" fill df using a smart way of using fillna or map.


Solution

  • You can do cumcount create the key

    s = pd.Series(di).explode().reset_index()
    
    s.columns = ['A','fill']
    df['key'] = df.groupby('A').cumcount()
    s['key'] = s.groupby('A').cumcount()
    
    df.merge(s,how='left')
    Out[463]: 
       A  ignore_me  fill_me  key fill
    0  1          0      NaN    0    a
    1  1          1      NaN    1    b
    2  1          2      NaN    2  NaN
    3  2          3      NaN    0    c
    4  2          4      NaN    1    d
    5  3          5      NaN    0    e
    6  3          6      NaN    1    f
    7  3          7      NaN    2    g
    8  3          8      NaN    3  NaN