I have a DataFrame of the following construction:
Column A | Column B |
---|---|
PK1 | RTLE |
PK2 | AXFM |
PK1 | |
PK2 | |
PK3 | |
PK4 | |
PK4 | UVRL |
PK5 |
As shown, column A contains a number of non-unique values. Column B is sometimes populated.
Where column B is nan I need to be to use the value in column A to search the table for any other rows containing the same value, and then populate the original nan column with any existing value found in column B for the second row.
Column A | Column B |
---|---|
PK1 | RTLE |
PK2 | AXFM |
PK1 | RTLE |
PK2 | AXFM |
PK3 | |
PK4 | UVRL |
PK4 | UVRL |
PK5 |
I tried using and lambda and ffill() and bfill() methods for this but my worry here is that the DataFrame may sometimes contain only one instance of a value in column A (eg PK3, and PK5 above) and I do not want them to be populated with the wrong value from forward or backfilling.
Any other ideas?
I would do it this way :
df["Column B"] = df["Column A"].map(df.groupby("Column A")["Column B"].first())
Output :
print(df)
Column A Column B
0 PK1 RTLE
1 PK2 AXFM
2 PK1 RTLE
3 PK2 AXFM
4 PK3 None
5 PK4 UVRL
6 PK4 UVRL
7 PK5 None