Search code examples
pythonpandasfillna

Filling NA values in one dataframe by another based on two columns


What is the most efficient way to fill NA values in one dataframe using another when there are more than one column to match (so in this case city and rooms)?

Example dataframes to be combined and the result dataframe:

import pandas as pd
import numpy as np

d1 = {'city' : ['New York', 'Shanghai', 'Boston', 'Shanghai', 
'Shanghai'],
'rooms': ["1","2","3","2","2"], 'floor': ["4","5","6","10","8"], 'rent': 
 [500, np.nan, 1500, 2000, np.nan]}


d2 = {'city' : ['Shanghai'],
'rooms': ["2"], 'rent': [1000]}

df1 = pd.DataFrame(data = d1)
df2 = pd.DataFrame(data = d2)

result = {'city' : ['New York', 'Shanghai','Boston', 'Shanghai', 
'Shanghai'],
'rooms': ["1","2","3","2","2"], 'floor': ["4","5","6","10","8"], 'rent': 
[500, 1000, 1500, 2000, 1000]}

result_df = pd.DataFrame(data = result)

Solution

  • Set the indices on both columns to align, and fill on the required column. In this case, the common columns are city and rooms:

    cols = ['city', 'rooms']
    

    Set index for df1:

    df1 = df1.set_index(cols)
    

    Set index for df2 :

    df2 = df2.set_index(cols).rent # make it a Series
    

    Fill df1 with df2 and reset index (indexes are good/useful):

    df1.fillna({"rent": df2}).reset_index()
    
           city rooms floor    rent
    0  New York     1     4   500.0
    1  Shanghai     2     5  1000.0
    2    Boston     3     6  1500.0
    3  Shanghai     2    10  2000.0
    4  Shanghai     2     8  1000.0
    

    Note that this works only if the data from df2 is unique