I've got millions of geographic points. For each one of these, I want to find all "neighboring points," i.e., all other points within some radius, say a few hundred meters.
There is a naive O(N^2) solution to this problem---simply calculate the distance of all pairs of points. However, because I'm dealing with a proper distance metric (geographic distance), there should be a quicker way to do this.
I would like to do this within python. One solution that comes to mind is to use some database (mySQL with GIS extentions, PostGIS) and hope that such a database would take care of efficiently performing the operation described above using some index. I would prefer something simpler though, that doesn't require me to build and learn about such technologies.
A couple of points
Put in terms of python code, I want something along the lines of:
points = [(lat1, long1), (lat2, long2) ... ] # this list contains millions lat/long tuples
points_index = magical_indexer(points)
neighbors = []
for point in points:
point_neighbors = points_index.get_points_within(point, 200) # get all points within 200 meters of point
neighbors.append(point_neighbors)
Tipped off by Eamon, I've come up with a simple solution using btrees implemented in SciPy.
from scipy.spatial import cKDTree
from scipy import inf
max_distance = 0.0001 # Assuming lats and longs are in decimal degrees, this corresponds to 11.1 meters
points = [(lat1, long1), (lat2, long2) ... ]
tree = cKDTree(points)
point_neighbors_list = [] # Put the neighbors of each point here
for point in points:
distances, indices = tree.query(point, len(points), p=2, distance_upper_bound=max_distance)
point_neighbors = []
for index, distance in zip(indices, distances):
if distance == inf:
break
point_neighbors.append(points[index])
point_neighbors_list.append(point_neighbors)