I want to fill col3 nan values if the combination of col1 and col2 are matched in another row (but only once). For example: on row 2, since col1 and col2 are both cat, then fill col3 with 'George'. On row 4, since col1 and col2 are Cat and Dog, then fill col3 with Matt
However, if the combination has occurred more than once like in rows 5 and 7, then ignore don't fill nan with anything.
col1 | col2 | col3 |
---|---|---|
Cat | Cat | George |
Cat | Cat | nan |
Cat | Dog | Matt |
Cat | Dog | nan |
Parrot | Elk | John |
Parrot | Elk | nan |
Parrot | Elk | Ryan |
Parrot | Elk | nan |
col1 | col2 | col3 |
---|---|---|
Cat | Cat | George |
Cat | Cat | George |
Cat | Dog | Matt |
Cat | Dog | Matt |
Parrot | Elk | John |
Parrot | Elk | nan |
Parrot | Elk | Ryan |
Parrot | Elk | nan |
I hope this makes sense. Thanks
We can first get the sizes of each group col1, col3
. Then those of size two we use ffill
on:
sizes = df.groupby(['col1', 'col2'])['col2'].transform("size")
df.loc[sizes <= 2, "col3"] = df.loc[sizes <= 2, "col3"].ffill()
col1 col2 col3
0 Cat Cat George
1 Cat Cat George
2 Cat Dog Matt
3 Cat Dog Matt
4 Parrot Elk John
5 Parrot Elk NaN
6 Parrot Elk Ryan
7 Parrot Elk NaN