Search code examples
pythondataframeloopsif-statementlambda

Calculations based on values from 2 tables


I am trying to find the farthest point by coordinates. I have 2 data tables:

data={'X':[1,1,1,3,4],'Y':[1,2,4,3,5]}
data=pd.DataFrame(data)

points={'ID':['1','2','3','4'],'X':[1,2,4,5],'Y':[3,3,4,1]}
points=pd.DataFrame(points)

I would like to determine which point from the "points" table is the farthest one from the coordinates included in the "data" table. The calculation I wish to use is the vector distance as follows.

d = √(x2 −x1)2 +(y2 −y1)2

Based on the example, this is my "data": photo in link And this is "points" table: photo in link

I would like to indicate that the ID 4 in the "points" table is the farthest point (red one in the picture) from the coordinates in the "data" table. What I've tried:

points['DISTANCE']=data.apply(lambda x: (np.sqrt(((x['X']-points['X'])**2)+((x['Y']-points['Y'])**2))).max(), axis=1)

Unfortunately the code returns a wrong result, not what I expected. ID 4 should have the most value in DISTANCE column.

ID X Y DISTANCE
1 1 3 4.242641
2 2 3 3.605551
3 4 4 3.000000
4 1 5 2.828427

I am asking for help in solving this problem.


Solution

  • I think you want to do something like this:

    data={'X':[1,1,1,3,4],'Y':[1,2,4,3,5]}
    data=pd.DataFrame(data)
    
    points={'ID':['1','2','3','4'],'X':[1,2,4,5],'Y':[3,3,4,1]}
    points=pd.DataFrame(points)
    
    def distance(x1, x2, y1, y2):
        return np.sqrt(
            ((x2-x1)**2) + ((y2-y1)**2)
        )
        
    results = []
    for n, row in points.iterrows():
        id = row.ID
        x1 = row.X
        y1 = row.Y
        for _n, _row in data.iterrows():
            x2 = _row.X
            y2 = _row.Y
            d = distance(x1, x2, y1, y2)
            results.append((d, id))
            print(f'{id} Point({x1},{y1}) vs Data({x2},{y2}) -> {d}')
    

    Output:

    1 Point(1,3) vs Data(1,1) -> 2.0
    1 Point(1,3) vs Data(1,2) -> 1.0
    1 Point(1,3) vs Data(1,4) -> 1.0
    1 Point(1,3) vs Data(3,3) -> 2.0
    1 Point(1,3) vs Data(4,5) -> 3.605551275463989
    2 Point(2,3) vs Data(1,1) -> 2.23606797749979
    2 Point(2,3) vs Data(1,2) -> 1.4142135623730951
    2 Point(2,3) vs Data(1,4) -> 1.4142135623730951
    2 Point(2,3) vs Data(3,3) -> 1.0
    2 Point(2,3) vs Data(4,5) -> 2.8284271247461903
    3 Point(4,4) vs Data(1,1) -> 4.242640687119285
    3 Point(4,4) vs Data(1,2) -> 3.605551275463989
    3 Point(4,4) vs Data(1,4) -> 3.0
    3 Point(4,4) vs Data(3,3) -> 1.4142135623730951
    3 Point(4,4) vs Data(4,5) -> 1.0
    4 Point(5,1) vs Data(1,1) -> 4.0
    4 Point(5,1) vs Data(1,2) -> 4.123105625617661
    4 Point(5,1) vs Data(1,4) -> 5.0
    4 Point(5,1) vs Data(3,3) -> 2.8284271247461903
    4 Point(5,1) vs Data(4,5) -> 4.123105625617661
    

    To get the most distant:

    max_distance, id_max_distance = sorted(results)[-1]
    print(max_distance) #5.0
    print(id_max_distance) #4