I got a table with lots of point informations and I need to fill the position field after row wise comparison of the four fields before.
If the X- & Y-Coordinate is equal and also the ID_01, a comparison of ID_02 is required to assign "End" into the Position field for the lower ID_02 value, hence the row with value 35 and "Start" into the one with row equal 36 as its larger.
X-Coordinate | Y-Coordinate | ID_01 | ID_02 | Position |
---|---|---|---|---|
45000 | 554000 | 15 | 35 | ? |
45000 | 554000 | 15 | 36 | ? |
94475 | 59530 | 1 | 1 | |
94491 | 60948 | 1 | 1 | |
94491 | 60948 | 1 | 2 | |
94151 | 64480 | 1 | 2 | |
94151 | 64480 | 1 | 3 | |
95408 | 68694 | 1 | 3 | |
95408 | 68694 | 1 | 4 | |
94703 | 69961 | 1 | 4 | |
94703 | 69961 | 1 | 5 | |
93719 | 70786 | 1 | 5 | |
93719 | 70786 | 1 | 6 | |
95310 | 72044 | 1 | 6 | |
95310 | 72044 | 1 | 7 | |
99525 | 82049 | 1 | 7 | |
99525 | 82049 | 1 | 8 | |
101600 | 84306 | 1 | 8 | |
102744 | 85032 | 1 | 9 | |
101600 | 84306 | 1 | 9 | |
102744 | 85032 | 1 | 10 | |
104155 | 86535 | 1 | 10 | |
104575 | 86430 | 1 | 11 |
How would you handle in a pandas dataframe for instance?
You can use a boolean mask. First sort your values by ID_02
then check duplicated values. The position with row set to True
has the End
position, the other the Start
position:
m = df.sort_values('ID_02').duplicated(['X-Coordinate', 'Y-Coordinate', 'ID_01'])
df['Position'] = np.where(m, 'End', 'Start')
print(df)
# Output
X-Coordinate Y-Coordinate ID_01 ID_02 Position
0 45000 554000 15 35 Start
1 45000 554000 15 36 End