Search code examples
pandasdataframelambdaconditional-statements

Populate DataFrame column A where value in column B exists in another row


I have a DataFrame of the following construction:

Column A Column B
PK1 RTLE
PK2 AXFM
PK1
PK2
PK3
PK4
PK4 UVRL
PK5

As shown, column A contains a number of non-unique values. Column B is sometimes populated.

Where column B is nan I need to be to use the value in column A to search the table for any other rows containing the same value, and then populate the original nan column with any existing value found in column B for the second row.

Column A Column B
PK1 RTLE
PK2 AXFM
PK1 RTLE
PK2 AXFM
PK3
PK4 UVRL
PK4 UVRL
PK5

I tried using and lambda and ffill() and bfill() methods for this but my worry here is that the DataFrame may sometimes contain only one instance of a value in column A (eg PK3, and PK5 above) and I do not want them to be populated with the wrong value from forward or backfilling.

Any other ideas?


Solution

  • I would do it this way :

    df["Column B"] = df["Column A"].map(df.groupby("Column A")["Column B"].first())
    

    ​ Output :

    print(df)
    
      Column A Column B
    0      PK1     RTLE
    1      PK2     AXFM
    2      PK1     RTLE
    3      PK2     AXFM
    4      PK3     None
    5      PK4     UVRL
    6      PK4     UVRL
    7      PK5     None