Search code examples
pythonpandasunique

Filling nan values if unique conditions on multiple columns are met in Pandas


I want to fill col3 nan values if the combination of col1 and col2 are matched in another row (but only once). For example: on row 2, since col1 and col2 are both cat, then fill col3 with 'George'. On row 4, since col1 and col2 are Cat and Dog, then fill col3 with Matt

However, if the combination has occurred more than once like in rows 5 and 7, then ignore don't fill nan with anything.

col1 col2 col3
Cat Cat George
Cat Cat nan
Cat Dog Matt
Cat Dog nan
Parrot Elk John
Parrot Elk nan
Parrot Elk Ryan
Parrot Elk nan
col1 col2 col3
Cat Cat George
Cat Cat George
Cat Dog Matt
Cat Dog Matt
Parrot Elk John
Parrot Elk nan
Parrot Elk Ryan
Parrot Elk nan

I hope this makes sense. Thanks


Solution

  • We can first get the sizes of each group col1, col3. Then those of size two we use ffill on:

    sizes = df.groupby(['col1', 'col2'])['col2'].transform("size")
    df.loc[sizes <= 2, "col3"] = df.loc[sizes <= 2, "col3"].ffill()
    
         col1 col2    col3
    0     Cat  Cat  George
    1     Cat  Cat  George
    2     Cat  Dog    Matt
    3     Cat  Dog    Matt
    4  Parrot  Elk    John
    5  Parrot  Elk     NaN
    6  Parrot  Elk    Ryan
    7  Parrot  Elk     NaN