Search code examples
pandasouter-joinpass-by-value

How to assign variable to merged Pandas dataframe within function


I'd like the dataframe passed into this function to be modified.

def func(df):
    left_df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
    right_df = pd.DataFrame([[5, 6], [7, 8]], columns=['C', 'D'])
    df = pd.merge(left_df, right_df, how='outer', left_index=True, right_index=True)
    print("df is now a merged dataframe!")

test = pd.DataFrame()
func(test)

However, since Python passes by value, the callee func() gets a copy of df which points to the original empty dataframe. When it is assigned to the merged dataframe, it creates a new object returned by pd.merge() and points df to this new object. However, test is unchanged and continues pointing to the original empty dataframe.

How can we merge inplace in func() so test is actually changed? I'd like something like pandas.DataFrame.update(), but this only lets you do left joins.


Solution

  • IIUC, something like this?

    def func(df):
        left_df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
        right_df = pd.DataFrame([[5, 6], [7, 8]], columns=['C', 'D'])
        df = pd.merge(left_df, right_df, how='outer', left_index=True, right_index=True)
        print("df is now a merged dataframe!")
        global test 
        test = df
    
    test = pd.DataFrame()
    func(test)
    print(test)
    

    Output:

    df is now a merged dataframe!
       A  B  C  D
    0  1  2  5  6
    1  3  4  7  8