Search code examples
pythonscipygeolocationgeospatiallatitude-longitude

KDTree is Returning Points Outside of Radius


I have an array of lat long coordinates and I am trying to use a KDTree and scipy's query_ball_point to return all data points within a 1 mile radius of a designated latitude and longitude.

The problem is that query_ball_point is returning points that are outside of the specified 1 mile radius. Here's my code:

import pandas as pd
import scipy as sp
import geocoder
import pysal as psl


search_list = df['coordinates'].tolist()
tree = psl.cg.KDTree(search_list, distance_metric='Arc', radius=psl.cg.RADIUS_EARTH_MILES)
latlong = (39.698840000000004, -104.975916)
index = tree.query_ball_point(latlong,r=1)

The result is an array of coordinates like the following:

+---------------------------------------+
|              coordinates              |
+---------------------------------------+
| (39.676973877551, -104.966231826172)  |
| (39.6777407534644, -104.988982458831) |
| ...                                   |
+---------------------------------------+

When I try to use the haversine formula to validate these results, I see the first coordinate is 1.6 miles way

from haversine import haversine
haversine((39.676973877551, -104.966231826172),
         (39.698840000000004, -104.975916),miles=True)

1.5961362762187963

Solution

  • Pysal doesn't use the haversine function to calculate distance for the query_ball_point method. It uses the pysal.cg.sphere.arcdist function, which is different.

    import pysal
    from pysal.cg.kdtree import KDTree    
    
    locations = [(40.702566, -73.816859),
             (40.70546, -73.810708),
             (40.709179, -73.820574),
             (40.700486, -73.807969),
             (40.694624, -73.820593),
             (40.695132, -73.820841),
             (40.694095, -73.821334),
             (40.694165, -73.822368),
             (40.695077, -73.822817),
             (40.6747769261, -73.8092618174)] 
    tree = KDTree(locations, distance_metric='Arc', radius=pysal.cg.RADIUS_EARTH_MILES)
    current_point = (40.709523, -73.802472)
    # get all points within X miles of 'current_point'
    indices = tree.query_ball_point(current_point, 1)
    for i in indices:
        print(locations[i])
    

    There are 3 points within 1 mile

    (40.70546, -73.810708)
    (40.700486, -73.807969)
    (40.6747769261, -73.8092618174)
    

    Not all these points are within 1 mile according to haversine formula:

    from haversine import haversine
    for i in indices:
        print(haversine(current_points, locations[i], miles = True))
    
    0.5146716729994124
    0.6875825817591269
    2.4269297885659022
    

    But they are within 1 mile according to pysal's arcdist formula using radius of 3958.756 miles:

    from pysal.cg.sphere import arcdist
    for i in indices:
        print(arcdist(current_points, locations[i], 3958.756))
    
    0.5744128196875283
    0.4178272122350164
    0.8175408580090955