I have a large list of polygons (>10^6) most of which are non-intersecting but some of these polygons are hole of another polygon (~10^3 cases). Here a image to explain the problem, the smaller polygon is a hole in the larger polygon but both are independent polygon in the list of polygons.
Now I would like to efficiently determine which polygons are holes and substract the holes i.e. subtract the smaller polygons which lie completely inside in another polygon and return a list of "cleaned" polygons. A pair of hole and parent polygon should be transformed like this (so basically hole subtracted from the parent):
There are plenty of similar questions on Stackoverflow and gis.stackexchange.com but I haven't found one that actually solves this problems. Here are some related questions: 1. https://gis.stackexchange.com/questions/5405/using-shapely-translating-between-polygons-and-multipolygons 2. https://gis.stackexchange.com/questions/319546/converting-list-of-polygons-to-multipolygon-using-shapely
Here is a sample code.
from shapely.geometry import Point
from shapely.geometry import MultiPolygon
from shapely.ops import unary_union
import numpy as np
#Generate a list of polygons, where some are holes in others;
def generateRandomPolygons(polygonCount = 100, areaDimension = 1000, holeProbability = 0.5):
pl = []
radiusLarge = 2 #In the real dataset the size of polygons can vary
radiusSmall = 1 #Size of holes can also vary
for i in range(polygonCount):
x, y = np.random.randint(0,areaDimension,(2))
rn1 = np.random.random(1)
pl.append(Point(x, y).buffer(radiusLarge))
if rn1 < holeProbability: #With a holeProbability add a hole in the large polygon that was just added to the list
pl.append(Point(x, y).buffer(radiusSmall))
return pl
polygons = generateRandomPolygons()
print(len(pl))
Now how can I create a new list of polygons with the holes removed. Shapely provides functions to subtract one polygon from another (difference) but is there a similar function for lists of polygons (maybe something like unary_union but where overlaps are removed)? Alternative how to efficiently determine which are holes and then subtract them from the larger polygons?
Your problem is you don't know which ones are "holes", right? To "efficiently determine which polygons are holes", you can use an rtree to speed up the intersection check:
from rtree.index import Index
# create an rtree for efficient spatial queries
rtree = Index((i, p.bounds, None) for i, p in enumerate(polygons))
donuts = []
for i, this_poly in enumerate(polygons):
# loop over indices of approximately intersecting polygons
for j in rtree.intersection(this_poly.bounds):
# ignore the intersection of this polygon with itself
if i == j:
continue
other_poly = polygons[j]
# ensure the polygon fully contains our match
if this_poly.contains(other_poly):
donut = this_poly.difference(other_poly)
donuts.append(donut)
break # quit searching
print(len(donuts))