I have a shapefile contining 13 polygons, and an address list of ~33K addresses from the same general area as the polygons. I geocoded the addresses via Google geocoding API, and am now trying to see which addresses are in which polygons, grouping them by the name of the polygon.
I can get things to work one polygon at a time, but I'm missing something in the loop.
Here's where I'm at, so far:
# Import shapefile and convert coordinates to match address file
sf = gpd.read_file(MY_SHAPEFILE)
sf_geo = sf.to_crs(epsg=4326)
# Import geocoded addresses
address_data = pd.read_csv(ADDRESS_FILE)
# Create points from lat/lon coordinate columns in address file
geometry_points = [Point(xy) for xy in zip(address_data['longitude'],
address_data['latitude'])]
# Create object from one of the Polygons
p = sf_geo.iloc[0].geometry
i = 0
for point in geometry_points:
if point.within(p):
i += 1
print(i)
else:
continue
The above works just fine, on all polygons. However, what I am really hoping for is to be able to update a dict where the key is the actual name of the polygon, and the values are all of the points that match within that polygon. Then, I can just add the polygon names to the address list.
# Extract the names of each polygon
area_names = list(sf_geo['Name'])
# Create dict of polygon name : geometry
for r in sf_geo:
shape_dict = dict(zip(area_names, sf['geometry']))
# Initialize empty dict to hold list of all addresses within each polygon
polygon_contains_dict = {k: [] for k in area_names}
The above creates a dict of this format when printed:
{'10 ppm': <shapely.geometry.polygon.Polygon object at 0x7fea194225d0>, '20 ppm': <shapely.geometry.polygon.Polygon object at 0x7fe9f2e23590>, ETC}
as well as a dict where the keys are the same as shape_dict, but the values are empty lists.
I'm using the following to try and loop through all of the keys in shape_dict and all of the points that were created from the addresses, and update a list that then becomes the values for each key in polygon_contains_dict:
for key, value in shape_dict.items():
contains_list = []
not_contained = []
for point in geometry_points:
if point.within(value):
contains_list.append(point)
else:
not_contained.append(point)
polygon_contains_dict[key] = contains_list
However, this adds nothing to neither the contains_list nor (obviously) the values in polygon_contains_dict. All points get dumped in not_contained.
Since I know that points are, in fact, within some polygons, I know I'm missing something. All points in geometry_points are Point objects, and all polygons in shape_dict.values are Polygon objects.
What am I missing? Thanks for any help.
I suggest you avoid looping entirely and create a second geopandas dataframe for your coordinate and addresses data and then do a spatial join:
# Import geocoded addresses
address_data = pd.read_csv(ADDRESS_FILE)
# Create points from lat/lon coordinate columns in address file
geometry_points = [Point(xy) for xy in zip(address_data['longitude'],
address_data['latitude'])]
address_gpd=gpd.GeoDataFrame(address_data,crs={'init': 'epsg:4326'},geometry=geometry_points) # second geopandas frame
# inner spatial join with shapefile
df=gpd.sjoin(sf_geo,address_gpd,how='inner',op='intersects')
The df
dataframe will now have all the addresses within each polygon then to "update a dict where the key is the actual name of the polygon, and the values are all of the points that match within that polygon" you can use groupby and to_dict
df=df.groupby(['Name'])['address'].apply(list)
polygon_contains_dict=df.to_dict('index')
where I asumed the column name for your addresses is is address
so change that if not the case.
See geopandas docs on merging data for more info on spatial joins.