Search code examples
pandasdictionarymergechaining

pandas outer join and improve pandas naming of left vs right table entries in resulting join


I would like to in a single "chained" statement merge two tables, and label in the resulting table those entries from the left table vs right table. Pandas defaults this to "left_only", "right_only", "both" and can be seen by invoking indicator = True in the merge statement.However, I would like to change these to something more helpful via method chaining in Pandas.

This is what I am currently doing - rather inelegantly- without chaining of the merge result.

   oran = pd.DataFrame(columns = ['Name', 'type'], data = [['oranger', 'FP'],
                       ['oranged', 'CP'], ['orangeo', 'CP'], ['orangel', 'CP'], ['orangey', 'NP']])

   other =  pd.DataFrame(columns = ['Name', 'type'], data = [['appler', 'FP'],
                       ['appled', 'CP'], ['appleo', 'CP'], ['orangel', 'CP'],  ['orangey', 'NP']])


 # Merge two dataframe above; add indicator column for table source
   df = pd.merge(oran['Name'], other['Name'],on = 'Name', how = 'outer', indicator = True)

   dict_map = {'both':'Both','left_only':'oran only',
            'right_only':'other only'}

   df["indicator"] = df["_merge"].map(dict_map).

   df.drop("_merge", axis = 1, inplace = True)

   df

The above approach gives me the output desired. However, the code involves multiple steps. How can I generate the same output as shown below via method chaining in Pandas?

enter image description here


Solution

  • Setting directly the correct name for "indicator" and a simple assign would work here:

    df = (pd.merge(oran['Name'], other['Name'], on='Name', how='outer', indicator='indicator')
            .assign(indicator=lambda d: d['indicator'].map(dict_map))
          )
    

    If we imagine that setting up the name was not possible, we could drop and return the column with pop:

    df = (pd.merge(oran['Name'], other['Name'], on='Name', how='outer', indicator=True)
            .assign(indicator=lambda d: d.pop('_merge').map(dict_map))
          )
    

    Output:

          Name   indicator
    0  oranger   oran only
    1  oranged   oran only
    2  orangeo   oran only
    3  orangel        Both
    4  orangey        Both
    5   appler  other only
    6   appled  other only
    7   appleo  other only