Search code examples
pandasdataframemultiple-conditions

Pandas replace the value of a column based on the value of the same column in another row


I am trying to replace the values of a dataframe column with the value of the same column but from another row.

The ID of the COUNTRY with type "TO_REPLACE" must be replaced by the ID of the same COUNTRY with type "ORIGINAL".

Example DF:

ID     NAME        TYPE
---
ID1    COUNTRY1    ORIGINAL       <-- ORIGINAL of COUNTRY1 (ID1)
ID2    COUNTRY1    TO_REPLACE1    <-- ID must be ID1
ID3    COUNTRY1    TO_REPLACE2    <-- ID must be ID1
...
ID4    COUNTRYX    TO_REPLACE1    <-- ID must be ID6
ID5    COUNTRYX    TO_REPLACE2    <-- ID must be ID6
ID6    COUNTRYX    ORIGINAL       <-- ORIGINAL of COUNTRYX (ID6)

I need: if -TYPE- is "TO_REPLACE1" or "TO_REPLACE2" then find that -NAME- with -TYPE- "ORIGINAL" and put the "ORIGINAL ID" on ID. At the end in this example all -NAME- COUNTRY1 must be ID1 (the "ORIGINAL" ID) and COUNTRYX must be ID6.

Result:

ID     NAME        TYPE
---
ID1    COUNTRY1    ORIGINAL
ID1    COUNTRY1    TO_REPLACE1    <-- ID replaced
ID1    COUNTRY1    TO_REPLACE2    <-- ID replaced
...
ID6    COUNTRYX    TO_REPLACE1    <-- ID replaced
ID6    COUNTRYX    TO_REPLACE2    <-- ID replaced
ID6    COUNTRYX    ORIGINAL

After many attempts, the closest is this one:

df.loc[(df['TYPE'].isin(['TO_REPLACE1', 'TO_REPLACE2']), 'ID'] = df.loc[df['TYPE'] == 'ORIGINAL', 'ID'].values[0]

But replace only by the first "ORIGINAL ID" found.


Solution

  • You can mask/where and use a groupby.transform:

    df['ID'] = (df['ID']
     .where(df['TYPE'].eq('ORIGINAL'))
     .groupby(df['NAME']).transform('first')
    )
    

    Or with map:

    df['ID'] = df['NAME'].map(df[df['TYPE'].eq('ORIGINAL')].set_index('NAME')['ID'])
    

    A more generic approach if you really need to match "ORIGINAL" and "TO_REPLACE" (assuming other values):

    df['ID'] = (df['ID']
     .mask(df['TYPE'].str.startswith('TO_REPLACE'))
     .fillna(df['NAME'].map(df[df['TYPE'].eq('ORIGINAL')]
                            .set_index('NAME')['ID']))
    )
    

    Output:

        ID      NAME         TYPE
    0  ID1  COUNTRY1     ORIGINAL
    1  ID1  COUNTRY1  TO_REPLACE1
    2  ID1  COUNTRY1  TO_REPLACE2
    3  ID6  COUNTRYX  TO_REPLACE1
    4  ID6  COUNTRYX  TO_REPLACE2
    5  ID6  COUNTRYX     ORIGINAL