Search code examples
pythonpandasdataframeindexingmask

Pandas update field value with the output of another condition


I have the following dataframe:

enter image description here

What I want to achieve is, if id1 is not equal to id2, I want to update the y values (in name column) with x value (x is when id1 equals to id2).

test = pd.DataFrame({'id1':[1, 1, 1, 2, 2, 1],
                     'id2': [1, 1, 1, 1, 1, 1],
                     'name': ['x', 'x', 'x', 'y', 'y', 'x']})

what I did is this:

test.loc[test['id1'] != test['id2'], 'name'] = test[test['id1'] == test['id2']]['name'].tolist()[0]

It works well, but I don't like the way I solved it (because of this .tolist()[0]). I feel there is another more correct solution.

Edit:

Sometimes we don't have id1==id2 in this Dataframe, so my solution will produce an error, because there no elements in test[test['id1'] == test['id2']]['name'].tolist()[0].


Solution

  • So you can use where then ffill / bfill and fillna to leave your dataframe untouched if no rows match id1 == id2:

    test['name'] = (test['name'].where(test['id2'] == test['id1'])
                                .ffill().bfill().fillna(test['name']))
    

    Output:

    >>> test
       id1  id2 name
    0    1    1    x
    1    1    1    x
    2    1    1    x
    3    2    1    x
    4    2    1    x
    5    1    1    x