Search code examples
pythongisgeospatialdistancegeo

Find all coordinates within a circle in geographic data in python


I've got millions of geographic points. For each one of these, I want to find all "neighboring points," i.e., all other points within some radius, say a few hundred meters.

There is a naive O(N^2) solution to this problem---simply calculate the distance of all pairs of points. However, because I'm dealing with a proper distance metric (geographic distance), there should be a quicker way to do this.

I would like to do this within python. One solution that comes to mind is to use some database (mySQL with GIS extentions, PostGIS) and hope that such a database would take care of efficiently performing the operation described above using some index. I would prefer something simpler though, that doesn't require me to build and learn about such technologies.

A couple of points

  • I will perform the "find neighbors" operation millions of times
  • The data will remain static
  • Because the problem is in a sense simple, I'd like to see they python code that solves it.

Put in terms of python code, I want something along the lines of:

points = [(lat1, long1), (lat2, long2) ... ] # this list contains millions lat/long tuples
points_index = magical_indexer(points)
neighbors = []
for point in points:
    point_neighbors = points_index.get_points_within(point, 200) # get all points within 200 meters of point
    neighbors.append(point_neighbors) 

Solution

  • Tipped off by Eamon, I've come up with a simple solution using btrees implemented in SciPy.

    from scipy.spatial import cKDTree
    from scipy import inf
    
    max_distance = 0.0001 # Assuming lats and longs are in decimal degrees, this corresponds to 11.1 meters
    points = [(lat1, long1), (lat2, long2) ... ]
    tree = cKDTree(points)
    
    point_neighbors_list = [] # Put the neighbors of each point here
    
    for point in points:
        distances, indices = tree.query(point, len(points), p=2, distance_upper_bound=max_distance)
        point_neighbors = []
        for index, distance in zip(indices, distances):
            if distance == inf:
                break
            point_neighbors.append(points[index])
        point_neighbors_list.append(point_neighbors)