Search code examples
pythonpandasmerge

Avoiding Merge In Pandas


I have a data frame that looks like this :

enter image description here

I want to group the data frame by #PROD and #CURRENCY and replace TP with the contents of the Offshore data in the Loc column Without creating two data frames and joining them.

The final output will look something like:

enter image description here

I was able to create the output by splitting the data frame into two (Onshore and Offshore ) and then creating a join on #PROD and #CURRENCY. However, I was wondering if there is a cleaner way to do this ?

The Code for the Dataframe is :

import pandas as pd
data=[['Offshore','NY','A','USD','ABC_USD'],['Onshore','BH','A','USD',''],                  ['Onshore','AE','A','USD',''],\
     ['Offshore','NY','A','GBP','GBP_ABC'],['Onshore','BH','A','GBP',''],                           ['Onshore','AE','A','GBP',''],\
     ['Onshore','BH','A','EUR',''],['Onshore','AE','A','EUR','']]

df = pd.DataFrame(data, columns=['Loc', 'Country','#PROD','#CURRENCY','TP'])
df

Solution

  • I think a merge is the most straightforward and efficient way to do this:

    df['TP'] = df[cols].merge(df[df['Loc'].eq('Offshore')], how='left')['TP'].values
    

    No need to sort, no need to worry about which values are initially present.

    Alternatively:

    cols = ['#PROD', '#CURRENCY']
    s = (df[cols].reset_index().merge(df[df['Loc'].eq('Offshore')])
         .set_index('index')['TP']
        )
    df.loc[s.index, 'TP'] = s
    

    Output:

            Loc Country #PROD #CURRENCY       TP
    0  Offshore      NY     A       USD  ABC_USD
    1   Onshore      BH     A       USD  ABC_USD
    2   Onshore      AE     A       USD  ABC_USD
    3  Offshore      NY     A       GBP  GBP_ABC
    4   Onshore      BH     A       GBP  GBP_ABC
    5   Onshore      AE     A       GBP  GBP_ABC
    6   Onshore      BH     A       EUR      NaN
    7   Onshore      AE     A       EUR      NaN