Search code examples
pythonpandasdataframeif-statementrow

How to compute row-wise comparison of multiple columns?


I got a table with lots of point informations and I need to fill the position field after row wise comparison of the four fields before.

If the X- & Y-Coordinate is equal and also the ID_01, a comparison of ID_02 is required to assign "End" into the Position field for the lower ID_02 value, hence the row with value 35 and "Start" into the one with row equal 36 as its larger.

X-Coordinate Y-Coordinate ID_01 ID_02 Position
45000 554000 15 35 ?
45000 554000 15 36 ?
94475 59530 1 1
94491 60948 1 1
94491 60948 1 2
94151 64480 1 2
94151 64480 1 3
95408 68694 1 3
95408 68694 1 4
94703 69961 1 4
94703 69961 1 5
93719 70786 1 5
93719 70786 1 6
95310 72044 1 6
95310 72044 1 7
99525 82049 1 7
99525 82049 1 8
101600 84306 1 8
102744 85032 1 9
101600 84306 1 9
102744 85032 1 10
104155 86535 1 10
104575 86430 1 11

How would you handle in a pandas dataframe for instance?


Solution

  • You can use a boolean mask. First sort your values by ID_02 then check duplicated values. The position with row set to True has the End position, the other the Start position:

    m = df.sort_values('ID_02').duplicated(['X-Coordinate', 'Y-Coordinate', 'ID_01'])
    
    df['Position'] = np.where(m, 'End', 'Start')
    print(df)
    
    # Output
       X-Coordinate  Y-Coordinate  ID_01  ID_02 Position
    0         45000        554000     15     35    Start
    1         45000        554000     15     36      End