I have a data frame that looks like this :
I want to group the data frame by #PROD and #CURRENCY and replace TP with the contents of the Offshore data in the Loc column Without creating two data frames and joining them.
The final output will look something like:
I was able to create the output by splitting the data frame into two (Onshore and Offshore ) and then creating a join on #PROD and #CURRENCY. However, I was wondering if there is a cleaner way to do this ?
The Code for the Dataframe is :
import pandas as pd
data=[['Offshore','NY','A','USD','ABC_USD'],['Onshore','BH','A','USD',''], ['Onshore','AE','A','USD',''],\
['Offshore','NY','A','GBP','GBP_ABC'],['Onshore','BH','A','GBP',''], ['Onshore','AE','A','GBP',''],\
['Onshore','BH','A','EUR',''],['Onshore','AE','A','EUR','']]
df = pd.DataFrame(data, columns=['Loc', 'Country','#PROD','#CURRENCY','TP'])
df
I think a merge
is the most straightforward and efficient way to do this:
df['TP'] = df[cols].merge(df[df['Loc'].eq('Offshore')], how='left')['TP'].values
No need to sort, no need to worry about which values are initially present.
Alternatively:
cols = ['#PROD', '#CURRENCY']
s = (df[cols].reset_index().merge(df[df['Loc'].eq('Offshore')])
.set_index('index')['TP']
)
df.loc[s.index, 'TP'] = s
Output:
Loc Country #PROD #CURRENCY TP
0 Offshore NY A USD ABC_USD
1 Onshore BH A USD ABC_USD
2 Onshore AE A USD ABC_USD
3 Offshore NY A GBP GBP_ABC
4 Onshore BH A GBP GBP_ABC
5 Onshore AE A GBP GBP_ABC
6 Onshore BH A EUR NaN
7 Onshore AE A EUR NaN