I am trying to replace the values of a dataframe column with the value of the same column but from another row.
The ID of the COUNTRY with type "TO_REPLACE" must be replaced by the ID of the same COUNTRY with type "ORIGINAL".
Example DF:
ID NAME TYPE
---
ID1 COUNTRY1 ORIGINAL <-- ORIGINAL of COUNTRY1 (ID1)
ID2 COUNTRY1 TO_REPLACE1 <-- ID must be ID1
ID3 COUNTRY1 TO_REPLACE2 <-- ID must be ID1
...
ID4 COUNTRYX TO_REPLACE1 <-- ID must be ID6
ID5 COUNTRYX TO_REPLACE2 <-- ID must be ID6
ID6 COUNTRYX ORIGINAL <-- ORIGINAL of COUNTRYX (ID6)
I need: if -TYPE- is "TO_REPLACE1" or "TO_REPLACE2" then find that -NAME- with -TYPE- "ORIGINAL" and put the "ORIGINAL ID" on ID. At the end in this example all -NAME- COUNTRY1 must be ID1 (the "ORIGINAL" ID) and COUNTRYX must be ID6.
Result:
ID NAME TYPE
---
ID1 COUNTRY1 ORIGINAL
ID1 COUNTRY1 TO_REPLACE1 <-- ID replaced
ID1 COUNTRY1 TO_REPLACE2 <-- ID replaced
...
ID6 COUNTRYX TO_REPLACE1 <-- ID replaced
ID6 COUNTRYX TO_REPLACE2 <-- ID replaced
ID6 COUNTRYX ORIGINAL
After many attempts, the closest is this one:
df.loc[(df['TYPE'].isin(['TO_REPLACE1', 'TO_REPLACE2']), 'ID'] = df.loc[df['TYPE'] == 'ORIGINAL', 'ID'].values[0]
But replace only by the first "ORIGINAL ID" found.
You can mask
/where
and use a groupby.transform
:
df['ID'] = (df['ID']
.where(df['TYPE'].eq('ORIGINAL'))
.groupby(df['NAME']).transform('first')
)
Or with map
:
df['ID'] = df['NAME'].map(df[df['TYPE'].eq('ORIGINAL')].set_index('NAME')['ID'])
A more generic approach if you really need to match "ORIGINAL" and "TO_REPLACE" (assuming other values):
df['ID'] = (df['ID']
.mask(df['TYPE'].str.startswith('TO_REPLACE'))
.fillna(df['NAME'].map(df[df['TYPE'].eq('ORIGINAL')]
.set_index('NAME')['ID']))
)
Output:
ID NAME TYPE
0 ID1 COUNTRY1 ORIGINAL
1 ID1 COUNTRY1 TO_REPLACE1
2 ID1 COUNTRY1 TO_REPLACE2
3 ID6 COUNTRYX TO_REPLACE1
4 ID6 COUNTRYX TO_REPLACE2
5 ID6 COUNTRYX ORIGINAL