Search code examples
pythongeospatialgeopandas

Geopandas buffer and intersect


I am using the geojson file from the [OpenData Vancouver][1] website and I am trying to find the zoning classifications that fall within 5 kms of a "Historical Area".

So, I am buffering all historical areas by 5kms (My data is projected), performing the intersect operation and using the intersect results as an index:

buffs = gdf_26910[gdf_26910['zoning_classification']=='Historical Area']['geometry'].buffer(5000) 
gdf_26910[buffs.intersects(gdf_26910['geometry'])]

However, this is the output I am getting:

    zoning_category     zoning_classification   zoning_district     object_id   geometry    area    centroid    buffer5K
87  HA  Historical Area     HA-1A   78541   POLYGON ((492805.516 5458679.305, 492805.038 5...   3848.384041     POINT (492778.785 5458699.947)  POLYGON ((497803.807 5458548.605, 497803.124 5...
111     HA  Historical Area     HA-3    78640   POLYGON ((491358.402 5458065.050, 491309.735 5...   66336.339719    POINT (491183.139 5458103.162)  POLYGON ((492818.045 5453267.595, 492421.697 5...
180     HA  Historical Area     HA-1A   78836   POLYGON ((492925.194 5458575.204, 492929.600 5...   90566.768532    POINT (492753.969 5458456.804)  POLYGON ((487583.872 5458086.263, 487564.746 5...
683     HA  Historical Area     HA-1    78779   POLYGON ((492925.194 5458575.204, 492802.702 5...   69052.427940    POINT (492606.372 5458621.753)  POLYGON ((487874.100 5456398.633, 487789.801 5...
1208    HA  Historical Area     HA-2    78833   POLYGON ((492332.139 5458699.308, 492346.989 5...   179805.027166   POINT (492343.437 5458944.412)  POLYGON ((489822.136 5454379.453, 489755.087 5...

Clearly, I am getting a match for the Historical Areas and not all the other geometries that intersect the buffers.

I have plotted the buffers and the output looks correct:

#Plot
base=gdf_26910.plot()
buffs.plot(ax=base, color='red', alpha=0.25)

[![enter image description here][2]][2]

I have also opened the data in QGIS and verified that there are 5 'Historical Areas' and they are all adjacent to 'Comprehensive Development'. So, the matching rows after the intersect operation should be "Comprehensive Development" at the least.

Where am I going wrong?


Solution

  • Two core points

    • need to work in meters for a 5km buffer. Hence have used estimate_utm_crs() for projection. Have also use cap_style and join_style for a more reflective buffered polygon.
    • have used sjoin() instead of mask approach in your code. This will effectively give duplicates, so de-dupe using pandas groupby().first()
    • UPDATE changed to predicate="within" and used folium to visualise (possibly helps you understand how geometry is working)
    import geopandas as gpd
    import folium
    
    gdf_26910 = gpd.read_file(
        "https://opendata.vancouver.ca/explore/dataset/zoning-districts-and-labels/download/?format=geojson&timezone=Europe/London&lang=en"
    )
    buffs = gdf_26910.loc[gdf_26910["zoning_classification"] == "Historical Area"]
    # buffer is defined as km, so need a CRS in meters...
    buffs = (
        buffs.to_crs(buffs.estimate_utm_crs())
        .buffer(5000, cap_style=2, join_style=3)
        .to_crs(gdf_ha.crs)
    )
    
    # this warns so is clearly bad !
    # gdf_26910[buffs.intersects(gdf_26910['geometry'])]
    # some geometries intersect multiple historical areas,  take first intersection from sjoin()
    gdf_5km = (
        gdf_26910.reset_index()
        .sjoin(buffs.to_frame(), predicate="within")
        .groupby("index")
        .first()
        .set_crs(gdf_26910.crs)
    )
    
    m = buffs.explore(name="buffer")
    gdf_5km.explore("zoning_classification", m=m, name="within")
    gdf_26910.explore("zoning_classification", m=m, name="all", legend=False)
    
    folium.LayerControl().add_to(m)
    m