Search code examples
geospatialgeopandas

How can I merge two datasets based on their geometry columns, where one dataset has point geometries and the other has polygon geometries?


I would like to merge the following datasetS based on the "geometry" variable. The reason I am trying to merge is that I need the names of the subnational regions in the format of iso_3166_2 and iso_3166_2 in the dataset called iso_geo. If the point geometries lie in polygon/multiploygon geometries then it should match.

smart_city

year country_id geometry
2019 Germany POINT (7.08000 51.51000)
2018 Monaco POINT (7.41700 43.74000)

iso_geo

n_region_iso_name iso_3166_2 geometry
Escut de Canillo AD-02 POLYGON ((1.59735 42.62192, 1.60830 42.61812, ...
Escut d'Encamp AD-03 MULTYPOLYGON ((1.71709 42.53989, 1.71062 42.52774, ...
import geopandas as gpd
from shapely.geometry import Point

# Assuming 'smart_city' is the DataFrame with point geometries and 'iso_geometry' is the DataFrame with polygon geometries

# Convert the 'geometry' column in 'smart_city' DataFrame to Point geometries
smart_city['geometry'] = smart_city['geometry'].apply(lambda row: Point(row.x, row.y))

# Convert 'smart_city' DataFrame to a GeoDataFrame
smart_city_geo = gpd.GeoDataFrame(smart_city, geometry='geometry')

# Convert 'iso_geometry' DataFrame to a GeoDataFrame
iso_geometry_geo = gpd.GeoDataFrame(iso_geometry, geometry='geometry')

# Perform the spatial join
merged = gpd.sjoin(smart_city_geo, iso_geometry_geo, how='left', op='within')

# The 'merged' GeoDataFrame will contain the attributes from both DataFrames based on the spatial relationship
# between the points and polygons.

I used this code. However, even though I have geometry columns it asked me to convert the dataframe to geodataframe so I write those codes but then it does not match some points and I could not understand why.#


Solution

  • When a question lacks complete code, data, or other informations (module version, running environment), it is not easy to answer.

    Let me help this way. I offer the tested code below, you just test it. Try edit the data and rerun to see if you can get the result or not.

    The code runs well on my machine. The data is simple enough to grasp. If you get errors, post the error messages in the comment section below.

    import os
    os.environ['USE_PYGEOS'] = '0'
    import geopandas
    import geopandas as gpd
    import numpy as np
    from shapely.geometry import Polygon, Point, MultiPolygon
    
    # Four square polygons in Africa
    a = Polygon([(0, 0), (0, 10), (10, 10), (10, 0)])
    b = Polygon([(0, 10), (0, 20), (10, 20), (10, 10)])
    c = Polygon([(10, 0), (10, 10), (20, 10), (20, 0)])
    d = MultiPolygon([
            (
                ((10, 10), (10, 20), (20, 20), (20, 10)), 
                [((20-2,20-1), (10+2,20-1), (15,17)),
                ((10+2,10+1), (20-2,10+1), (15,13))
                ]
            ),
            # More polygon here
        ]) # Polygon with 2 holes
    
    #((10,20), (20,20), (15,21))
    
    # Some points to go with the polygons
    pa = Point([5,5]) # in a
    pb = Point([15,5])
    pc = Point([5,15])
    pd = Point([15,15])
    pe = Point([15,15+3])
    pf = Point([5-6,5])
    
    # GeoDataFrame of all the polygons
    poly_gdf = gpd.GeoDataFrame({"Poly_name": ["A", "B", "C", "D"], "geometry": [a, b, c, d],
                        "colors": ["red","green","gray","brown"]}, crs="EPSG:4326")
    pnt_gdf = gpd.GeoDataFrame({"Point_name": ["pA", "pB", "pC", "pD", "pE", "pF"], 
                                "Value": [1,2,3,4,5,6],
                                "geometry": [pa, pb, pc, pd, pe, pf],
                        "colors": ["red","gray","green","brown", "black", "blue"]}, crs="EPSG:4326")
    # Plot them both
    ax1 = poly_gdf.plot(ec="k", fc=poly_gdf["colors"], alpha=0.5, zorder=9)
    pnt_gdf.plot(color=pnt_gdf["colors"], ax=ax1, alpha=1, zorder=10)
    

    Output plot for visualization/verification:

    join

    Quick Spatial Join

    points_w_poly = geopandas.sjoin(pnt_gdf, poly_gdf)
    points_w_poly
    
    Point_name  Value   geometry    colors_left index_right Poly_name   colors_right
    0   pA  1   POINT (5.00000 5.00000)     red     0   A   red
    1   pB  2   POINT (15.00000 5.00000)    gray    2   C   gray
    2   pC  3   POINT (5.00000 15.00000)    green   1   B   green
    3   pD  4   POINT (15.00000 15.00000)   brown   3   D   brown
    

    The points at the center of all the polygons can join successfully. The point inside the hole of a polygon or outside the outer-linear-rings of the polygons will not join.

    # View the attribute of the joined data (Value) for each polygon.
    points_w_poly[["Poly_name", "Value"]]
    
    Poly_name   Value
    0   A       1
    1   C       2
    2   B       3
    3   D       4

    More detailed spatial join

    The syntax of sjoin may be different for each version of geopandas.

    gpd.sjoin(pnt_gdf, poly_gdf, how='left', predicate='within', lsuffix='*Point', rsuffix='*Poly')
    

    Output:

        Point_name  Value   geometry    colors_*Point   index_*Poly Poly_name   colors_*Poly
    0   pA  1   POINT (5.00000 5.00000)     red     0.0   A     red
    1   pB  2   POINT (15.00000 5.00000)    gray    2.0   C     gray
    2   pC  3   POINT (5.00000 15.00000)    green   1.0   B     green
    3   pD  4   POINT (15.00000 15.00000)   brown   3.0   D     brown
    4   pE  5   POINT (15.00000 18.00000)   black   NaN   NaN   NaN
    5   pF  6   POINT (-1.00000 5.00000)    blue    NaN   NaN   NaN