I have the following dataframe:
What I want to achieve is, if id1 is not equal to id2, I want to update the y values (in name
column) with x value (x is when id1 equals to id2).
test = pd.DataFrame({'id1':[1, 1, 1, 2, 2, 1],
'id2': [1, 1, 1, 1, 1, 1],
'name': ['x', 'x', 'x', 'y', 'y', 'x']})
what I did is this:
test.loc[test['id1'] != test['id2'], 'name'] = test[test['id1'] == test['id2']]['name'].tolist()[0]
It works well, but I don't like the way I solved it (because of this .tolist()[0]
). I feel there is another more correct solution.
Edit:
Sometimes we don't have id1==id2
in this Dataframe, so my solution will produce an error, because there no elements in test[test['id1'] == test['id2']]['name'].tolist()[0]
.
So you can use where
then ffill / bfill
and fillna
to leave your dataframe untouched if no rows match id1 == id2
:
test['name'] = (test['name'].where(test['id2'] == test['id1'])
.ffill().bfill().fillna(test['name']))
Output:
>>> test
id1 id2 name
0 1 1 x
1 1 1 x
2 1 1 x
3 2 1 x
4 2 1 x
5 1 1 x