I have defined three Shapely linestrings and three Shapely polygons, which overlap/intersect each other in various places as shown in the annotated image below.
My understanding is that a shapely 'difference' operation should return the parts of the lines that are outside of the polygons. I'm not sure why, but when I perform a 'difference' operation, it seems to be keeping part of a line that is within one of the polygons. This is shown in the following plot where I have compared the original polygons to the output of the difference operation.
Note that, similarly, if I run an 'intersection' operation, it is missing this small segment. Can anyone explain why this is the case? Code to generate everything shown above is as follows:
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon, LineString
#Define lines and polygons:
linkID = ['1','2','3']
link_geom = [LineString([(0, 0), (10, 10)]),LineString([(10, 10), (20, 10)]),LineString([(20, 10), (25, 15)])]
gdf_links = gpd.GeoDataFrame({'linkID':linkID,'geometry':link_geom})
polyID = ['100','200','300']
poly_geom = [Polygon([(2, 1), (2, 3), (4, 3), (4, 1)]),Polygon([(15, 7), (15, 13), (18, 13), (18, 7)]),Polygon([(19, 7), (19, 13), (21, 13), (21, 7)])]
gdf_poly = gpd.GeoDataFrame({'polyID':polyID,'geometry':poly_geom})
links = gdf_links.unary_union
polys = gdf_poly.unary_union
#Show plot of lines and polygons together:
gpd.GeoSeries([links,polys]).plot(cmap='tab10')
#Split links at polygons, keeping segments that are outside of polgyon:
difference = gdf_links.difference(gdf_poly).reset_index(drop=True)
#Plot resulting 'difference' vs original polygons:
diff = difference.unary_union
gpd.GeoSeries([diff,polys]).plot(cmap='tab10')
You are performing the 'difference' function on your three separate linke and polygons. So the first line is getting cropped by the first box only. The secone link is getting cropped by the second box only. The third line is getting cropped by the third box only. The solution to this is cropping the lines on the joined polygon dataset so they are all cropped against all boxes. You can change this on the line:
difference = gdf_links.difference(gdf_poly).reset_index(drop=True)
and change it to:
difference = gdf_links.difference(polys).reset_index(drop=True)