Search code examples
pythongeopandasshapely

Adding all points within polygon as value to dictionary object via loop


I have a shapefile contining 13 polygons, and an address list of ~33K addresses from the same general area as the polygons. I geocoded the addresses via Google geocoding API, and am now trying to see which addresses are in which polygons, grouping them by the name of the polygon.

I can get things to work one polygon at a time, but I'm missing something in the loop.

Here's where I'm at, so far:

# Import shapefile and convert coordinates to match address file
sf = gpd.read_file(MY_SHAPEFILE)
sf_geo = sf.to_crs(epsg=4326)

# Import geocoded addresses
address_data = pd.read_csv(ADDRESS_FILE)
# Create points from lat/lon coordinate columns in address file
geometry_points = [Point(xy) for xy in zip(address_data['longitude'], 
                   address_data['latitude'])]

# Create object from one of the Polygons
p = sf_geo.iloc[0].geometry

i = 0
for point in geometry_points:
    if point.within(p):
        i += 1
        print(i)
    else:
        continue   

The above works just fine, on all polygons. However, what I am really hoping for is to be able to update a dict where the key is the actual name of the polygon, and the values are all of the points that match within that polygon. Then, I can just add the polygon names to the address list.

# Extract the names of each polygon
area_names = list(sf_geo['Name'])
# Create dict of polygon name : geometry
for r in sf_geo:
    shape_dict = dict(zip(area_names, sf['geometry']))

# Initialize empty dict to hold list of all addresses within each polygon
polygon_contains_dict = {k: [] for k in area_names}

The above creates a dict of this format when printed:

{'10 ppm': <shapely.geometry.polygon.Polygon object at 0x7fea194225d0>, '20 ppm': <shapely.geometry.polygon.Polygon object at 0x7fe9f2e23590>, ETC}

as well as a dict where the keys are the same as shape_dict, but the values are empty lists.

I'm using the following to try and loop through all of the keys in shape_dict and all of the points that were created from the addresses, and update a list that then becomes the values for each key in polygon_contains_dict:

for key, value in shape_dict.items():
    contains_list = []
    not_contained = []
    for point in geometry_points:
        if point.within(value):
            contains_list.append(point)
        else:
            not_contained.append(point)
    polygon_contains_dict[key] = contains_list

However, this adds nothing to neither the contains_list nor (obviously) the values in polygon_contains_dict. All points get dumped in not_contained.

Since I know that points are, in fact, within some polygons, I know I'm missing something. All points in geometry_points are Point objects, and all polygons in shape_dict.values are Polygon objects.

What am I missing? Thanks for any help.


Solution

  • I suggest you avoid looping entirely and create a second geopandas dataframe for your coordinate and addresses data and then do a spatial join:

    # Import geocoded addresses
    address_data = pd.read_csv(ADDRESS_FILE)
    # Create points from lat/lon coordinate columns in address file
    geometry_points = [Point(xy) for xy in zip(address_data['longitude'], 
                   address_data['latitude'])]
    address_gpd=gpd.GeoDataFrame(address_data,crs={'init': 'epsg:4326'},geometry=geometry_points) # second geopandas frame
    
    # inner spatial join with shapefile 
    df=gpd.sjoin(sf_geo,address_gpd,how='inner',op='intersects') 
    

    The df dataframe will now have all the addresses within each polygon then to "update a dict where the key is the actual name of the polygon, and the values are all of the points that match within that polygon" you can use groupby and to_dict

    df=df.groupby(['Name'])['address'].apply(list)
    polygon_contains_dict=df.to_dict('index') 
    

    where I asumed the column name for your addresses is is address so change that if not the case.

    See geopandas docs on merging data for more info on spatial joins.