Search code examples
pythonpandascoordinatesnearest-neighborhaversine

How do I calculate the euclidean distance to the nearest neighbour for each coordinates pair in meters in Pandas dataframe?


I have a dataframe like this

index place id var_lat_fact var_lon_fact
0 167312091448 5.6679820000 -0.0144950000
1 167312091448 5.6686320000 -0.0157910000
2 167312091448 5.6653530000 -0.0181980000
3 167312091448 5.6700970000 -0.0191400000
4 167312091448 5.6689810000 -0.0104040000

For each coordinates pair (lat, lon) I'd like to calculate the euclidean distance to the nearest neighbour within the dataframe. So each point gets a metric in the additional column (say, nearest_neighbour_dist) indicating that distance in meters.

Something like this

index place id var_lat_fact var_lon_fact nearest_neighbour_dist
0 167312091448 5.6679820000 -0.0144950000 160.588370
1 167312091448 5.6686320000 -0.0157910000 160.588370
2 167312091448 5.6653530000 -0.0181980000 451.525301
3 167312091448 5.6700970000 -0.0191400000 404.794908
4 167312091448 5.6689810000 -0.0104040000 466.104453

Just can't get my head around this... Any help would be greatly appreciated.


Solution

  • You can use sklearn's NearestNeighbors:

    from sklearn.neighbors import NearestNeighbors
    from numpy import deg2rad
    
    # set up the nearest neighbors
    neigh = NearestNeighbors(n_neighbors=1, metric='haversine')
    data = deg2rad(df[['var_lat_fact', 'var_lon_fact']])
    neigh.fit(data)
    
    # find the closest two points
    # the closest distance is self, the second one is the closest non-self
    df['nearest_neighbour_dist'] = (neigh.kneighbors(data,
                                                     n_neighbors=2, return_distance=True
                                                    )[0][:, -1]
                                    *6371*1000
                                   )
    

    Output:

       index      place_id  var_lat_fact  var_lon_fact  nearest_neighbour_dist
    0      0  167312091448      5.667982     -0.014495              160.588370
    1      1  167312091448      5.668632     -0.015791              160.588370
    2      2  167312091448      5.665353     -0.018198              451.525301
    3      3  167312091448      5.670097     -0.019140              404.794908
    4      4  167312091448      5.668981     -0.010404              466.104453
    

    Points on a map

    I wanted to double check the validity of the computations

    1 -> 2 (index 0-> 1 in your data) is indeed about 160.6 meters

    enter image description here