Search code examples
pythonpython-3.xpandasdataframefillna

Pandas fillna multiple columns with values from corresponding columns without repeating for each


Let's say I have a DataFrame like this:

x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,                                        
                  'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan], 
                  'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
                  'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})

And I want to fill the NaN values in col_x with the values in col_y respectively, I can do this:

x['col1_x'] = x['col1_x'].fillna(x['col1_y'])
x['col2_x'] = x['col2_x'].fillna(x['col2_y'])

print(x)

Which will yield:

   col1_x  col2_x  col1_y  col2_y
0    15.0    93.0      10      93
1    20.0    24.0      20      24
2   136.0    51.0      30      52
3    93.0    22.0      40     246
4   743.0    38.0      50     142
5    60.0    53.0      60      53
6    70.0    72.0      70      94
7    91.0     2.0      80       2

But requires to repeat the same function with different variables, now let's assume that I have a bigger DataFrame with much more columns, is it possible to do it without repeating?


Solution

    • you can use **kwargs to assign()
    • build up a dict with a comprehension to build **kwargs
    import pandas as pd
    import numpy as np
    x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,                                        
                      'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan], 
                      'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
                      'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})
    
    x.assign(**{c:x[c].fillna(x[c.replace("_x","_y")]) for c in x.columns if "_x" in c})
    
    col1_x col2_x col1_y col2_y
    0 15 93 10 93
    1 20 24 20 24
    2 136 51 30 52
    3 93 22 40 246
    4 743 38 50 142
    5 60 53 60 53
    6 70 72 70 94
    7 91 2 80 2

    How does it work

    # core - loop through columns that end with _x and generate it's pair column _y
    {c:c.replace("_x","_y") 
     for c in x.columns if "_x" in c}
    
    # now we have all the pairs of a columns let's do what we want - fillna()
    {c:x[c].fillna(x[c.replace("_x","_y")]) for c in x.columns if "_x" in c}
    
    # this dictionary matches this function.... https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html
    # so final part is call the function with **kwargs
    x.assign(**{c:x[c].fillna(x[c.replace("_x","_y")]) 
                for c in x.columns if "_x" in c})