Search code examples
pythonshapely

Incorrect leftover point when subtracting overlap between two polygons


I identify polygons that intersect, then compute their intersection and remove them from the first polygon. I get a weird left-over point. If I just take the difference between the polygons, this doesn't happen.

Why?

import pandas as pd
import geopandas as gpd
import shapely

test = pd.DataFrame({"geo_uuid": ["a", "b"], "geometry": [shapely.from_wkt("POLYGON ((-85.489530540446 40.72197039620563, -85.54038813980434 40.686016211216575, -85.42963027177998 40.682309666676865, -85.489530540446 40.72197039620563))"),
                                                            shapely.from_wkt("POLYGON ((-85.52717584228726 40.732495186799525, -85.49542480146151 40.693215026642285, -85.44637022730953 40.73203119983377, -85.52717584228726 40.732495186799525))")]})

test = gpd.GeoDataFrame(test).set_geometry("geometry").set_crs(epsg=4326)

geo_1 = test.geometry.values[0]
geo_2 = test.geometry.values[1]

overlap = geo_1.intersection(geo_2)
geo_1_overlap_removed = geo_1.difference(overlap)  # this will have an extra point contained in geo_2

geo_1_minus_geo_2 = geo_1.difference(geo_2)  # this is good

Why does this happen? What am I doing wrong?


Solution

  • This is due to coordinates not having infinite precision.

    To illustrate, plots of your different polygons at play here are included below...

    When you compare the first and the second plot, you can see that the left-most point of the 2nd plot is a point that is calculated as the intersection of two boundary lines of the original polygons.

    However, because a computer doesn't use infinite precision, in some cases this calculated point won't be exactly on the mathematical intersection, but ~a nanometer next to it. In this example it ended up ~ a nanometer to the right of it, so when the difference is calculated, a very thin triangle is not erased.

    An option to deal with this is to use shapely.set_precision or geopandas.set_precision. These functions will round all coordinates to the precision specified + will remove all parts of a polygon that are narrower than roughly this precision. For this example, the very narrow triangle will be removed. Check out plot 3 and 4 (the orange ones) below.

    plots of polygons

    Code sample:

    from matplotlib import pyplot as plt
    import pandas as pd
    import geopandas as gpd
    import shapely
    import shapely.plotting as plotter
    
    test = pd.DataFrame({"geo_uuid": ["a", "b"], "geometry": [shapely.from_wkt("POLYGON ((-85.489530540446 40.72197039620563, -85.54038813980434 40.686016211216575, -85.42963027177998 40.682309666676865, -85.489530540446 40.72197039620563))"),
                                                                shapely.from_wkt("POLYGON ((-85.52717584228726 40.732495186799525, -85.49542480146151 40.693215026642285, -85.44637022730953 40.73203119983377, -85.52717584228726 40.732495186799525))")]})
    
    test = gpd.GeoDataFrame(test).set_geometry("geometry").set_crs(epsg=4326)
    
    geo_1 = test.geometry.values[0]
    geo_2 = test.geometry.values[1]
    
    overlap = geo_1.intersection(geo_2)
    geo_1_overlap_removed = geo_1.difference(overlap)  # this will have an extra point contained in geo_2
    geo_1_overlap_removed_prec = shapely.set_precision(geo_1_overlap_removed, grid_size=1e-9)
    
    geo_1_minus_geo_2 = geo_1.difference(geo_2)  # this is good
    
    print(geo_1_overlap_removed)
    print(geo_1_minus_geo_2)
    
    fig, ax = plt.subplots(ncols=5, sharex=True, sharey=True)
    plotter.plot_polygon(geo_1, ax=ax[0])
    plotter.plot_polygon(geo_2, ax=ax[0], color="red")
    plotter.plot_polygon(overlap, ax=ax[1], color="red")
    plotter.plot_polygon(geo_1_overlap_removed, ax=ax[2], color="orange")
    plotter.plot_polygon(geo_1_overlap_removed_prec, ax=ax[3], color="orange")
    plotter.plot_polygon(geo_1_minus_geo_2, ax=ax[4], color="green")
    
    plt.show()