I would like to merge the following datasetS based on the "geometry" variable. The reason I am trying to merge is that I need the names of the subnational regions in the format of iso_3166_2 and iso_3166_2 in the dataset called iso_geo. If the point geometries lie in polygon/multiploygon geometries then it should match.
smart_city
year | country_id | geometry |
---|---|---|
2019 | Germany | POINT (7.08000 51.51000) |
2018 | Monaco | POINT (7.41700 43.74000) |
iso_geo
n_region_iso_name | iso_3166_2 | geometry |
---|---|---|
Escut de Canillo | AD-02 | POLYGON ((1.59735 42.62192, 1.60830 42.61812, ... |
Escut d'Encamp | AD-03 | MULTYPOLYGON ((1.71709 42.53989, 1.71062 42.52774, ... |
import geopandas as gpd
from shapely.geometry import Point
# Assuming 'smart_city' is the DataFrame with point geometries and 'iso_geometry' is the DataFrame with polygon geometries
# Convert the 'geometry' column in 'smart_city' DataFrame to Point geometries
smart_city['geometry'] = smart_city['geometry'].apply(lambda row: Point(row.x, row.y))
# Convert 'smart_city' DataFrame to a GeoDataFrame
smart_city_geo = gpd.GeoDataFrame(smart_city, geometry='geometry')
# Convert 'iso_geometry' DataFrame to a GeoDataFrame
iso_geometry_geo = gpd.GeoDataFrame(iso_geometry, geometry='geometry')
# Perform the spatial join
merged = gpd.sjoin(smart_city_geo, iso_geometry_geo, how='left', op='within')
# The 'merged' GeoDataFrame will contain the attributes from both DataFrames based on the spatial relationship
# between the points and polygons.
I used this code. However, even though I have geometry columns it asked me to convert the dataframe to geodataframe so I write those codes but then it does not match some points and I could not understand why.#
When a question lacks complete code, data, or other informations (module version, running environment), it is not easy to answer.
Let me help this way. I offer the tested code below, you just test it. Try edit the data and rerun to see if you can get the result or not.
The code runs well on my machine. The data is simple enough to grasp. If you get errors, post the error messages in the comment section below.
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas
import geopandas as gpd
import numpy as np
from shapely.geometry import Polygon, Point, MultiPolygon
# Four square polygons in Africa
a = Polygon([(0, 0), (0, 10), (10, 10), (10, 0)])
b = Polygon([(0, 10), (0, 20), (10, 20), (10, 10)])
c = Polygon([(10, 0), (10, 10), (20, 10), (20, 0)])
d = MultiPolygon([
(
((10, 10), (10, 20), (20, 20), (20, 10)),
[((20-2,20-1), (10+2,20-1), (15,17)),
((10+2,10+1), (20-2,10+1), (15,13))
]
),
# More polygon here
]) # Polygon with 2 holes
#((10,20), (20,20), (15,21))
# Some points to go with the polygons
pa = Point([5,5]) # in a
pb = Point([15,5])
pc = Point([5,15])
pd = Point([15,15])
pe = Point([15,15+3])
pf = Point([5-6,5])
# GeoDataFrame of all the polygons
poly_gdf = gpd.GeoDataFrame({"Poly_name": ["A", "B", "C", "D"], "geometry": [a, b, c, d],
"colors": ["red","green","gray","brown"]}, crs="EPSG:4326")
pnt_gdf = gpd.GeoDataFrame({"Point_name": ["pA", "pB", "pC", "pD", "pE", "pF"],
"Value": [1,2,3,4,5,6],
"geometry": [pa, pb, pc, pd, pe, pf],
"colors": ["red","gray","green","brown", "black", "blue"]}, crs="EPSG:4326")
# Plot them both
ax1 = poly_gdf.plot(ec="k", fc=poly_gdf["colors"], alpha=0.5, zorder=9)
pnt_gdf.plot(color=pnt_gdf["colors"], ax=ax1, alpha=1, zorder=10)
Output plot for visualization/verification:
Quick Spatial Join
points_w_poly = geopandas.sjoin(pnt_gdf, poly_gdf)
points_w_poly
Point_name Value geometry colors_left index_right Poly_name colors_right 0 pA 1 POINT (5.00000 5.00000) red 0 A red 1 pB 2 POINT (15.00000 5.00000) gray 2 C gray 2 pC 3 POINT (5.00000 15.00000) green 1 B green 3 pD 4 POINT (15.00000 15.00000) brown 3 D brown
The points at the center of all the polygons can join successfully. The point inside the hole of a polygon or outside the outer-linear-rings of the polygons will not join.
# View the attribute of the joined data (Value) for each polygon.
points_w_poly[["Poly_name", "Value"]]
Poly_name Value 0 A 1 1 C 2 2 B 3 3 D 4
More detailed spatial join
The syntax of sjoin
may be different for each version of geopandas.
gpd.sjoin(pnt_gdf, poly_gdf, how='left', predicate='within', lsuffix='*Point', rsuffix='*Poly')
Output:
Point_name Value geometry colors_*Point index_*Poly Poly_name colors_*Poly 0 pA 1 POINT (5.00000 5.00000) red 0.0 A red 1 pB 2 POINT (15.00000 5.00000) gray 2.0 C gray 2 pC 3 POINT (5.00000 15.00000) green 1.0 B green 3 pD 4 POINT (15.00000 15.00000) brown 3.0 D brown 4 pE 5 POINT (15.00000 18.00000) black NaN NaN NaN 5 pF 6 POINT (-1.00000 5.00000) blue NaN NaN NaN