Search code examples
pythongeojsongeocodeshapely

How do I reverse geocode a large number of points in python?


Right now I have a GeoJson File and the following function using shapely:

It takes in a coordinate and returns the neighborhood name

    def get_neighb(lat, lon):
    """Input Latitude and Longitude, Returns Neighborhood Name"""
    point = Point(lon, lat)
    found = False
    for feature in geo_data['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point):
            return(feature['properties']['neighborhood'])
            found = True
    if found is False:
        return('NA')

# Initialize list
tn = ['']*data.shape[0]
for i in range(len(tn)):
    tn[i] = get_neighb(data.latitude[i], data.longitude[i])

This works, but it is really slow, any thoughts on how I could speed it up, currently running it on 4,000,000 row.


Solution

  • If you want to avoid for example the heavy-duty machinery of a PostGIS database, it could be of interest to employ the rtree package as (as the documentation states) "cheapo spatial database". The idea would be mostly as follows:

    #!/usr/bin/env python
    from itertools import product
    from random import uniform, sample, seed
    from rtree import index
    from shapely.geometry import Point, Polygon, box, shape
    from shapely.affinity import translate
    
    seed(666)
    
    #generate random polygons, in your case, the polygons are stored
    #in geo_data['features']
    P = Polygon([(0, 0), (0.5, 0), (0.5, 0.5), (0, 0.5), (0, 0)])
    polygons = []
    for dx, dy in product(range(0, 100), range(0, 100)):
        polygons.append(translate(P, dx, dy))
    
    #construct the spatial index and insert bounding boxes of all polygons
    idx = index.Index()
    for pid, P in enumerate(polygons):
        idx.insert(pid, P.bounds)
    
    delta = 0.5
    for i in range(0, 1000):
        #generate random points
        x, y = uniform(0, 10), uniform(0, 10)
        pnt = Point(x, y)
    
        #create a region around the point of interest
        bounds = (x-delta, y-delta, x+delta, y+delta)
    
        #also possible, but much slower
        #bounds = pnt.buffer(delta).bounds
    
        #the index tells us which polygons are worth checking, i.e.,
        #the bounding box of which intersects with the region constructed in previous step
        for candidate in idx.intersection(bounds):
            P = polygons[candidate]
    
            #test only these candidates
            if P.contains(pnt):
                print(pnt, P)