Search code examples
pythonpandasdistancegeopy

How to create a column in Pandas with distance from coordinates using GeoPy


I have this df:

    latitude_1  longitude_1
0   -25.294871  -56.992654
1   -24.946374  -57.384543
2   -24.835273  -53.825342
3   -24.153553  -54.363844

And the following coordinates:

coords_2 = (-25.236632,  -56.835262)

So I want to create a 3rd column in the df that shows the distance between each row with coords_2.

If I try to do it without using Dataframes, it works (here I'm using random numbers):

import geopy.distance

coords_1 = (52.2296756, 21.0122287)
coords_2 = (43.263845, -42.2637377)

print(geopy.distance.distance(coords_1, coords_2).km)

Output:

4691.07078306837

So I want to apply this same logic to a Dataframe.

Thanks


Solution

  • If you want to compare your df coordinates with some external coordinates tuple, try this:

    import pandas as pd
    import geopy.distance
    
    df = pd.DataFrame(data={'latitude_1': [-25.294871, -24.946374], 'longitude_1': [-56.992654, -57.384543]})
    coords_2 = (-25.236632,  -56.835262)
    df['distance'] = df.apply(lambda x: geopy.distance.distance((x.latitude_1, x. longitude_1), coords_2).km, axis=1)
    
       latitude_1  longitude_1   distance
    0  -25.294871   -56.992654  17.116773
    1  -24.946374   -57.384543  64.062048
    

    Or with to_numpy():

    def distance(l1, l2, coords_2):
      return [geopy.distance.distance((lat, lng), coords_2).km for lat, lng in zip(l1, l2)]
    
    df['distance'] = distance(df["latitude_1"].to_numpy(),df["longitude_1"].to_numpy(), coords_2)