i am doing some OCR whit python, in order to get the coordinates of the letters in an image, i take the centroid of a region(returned by the regionprops from skimage.measure) and if a distance between one centroid vs the others centroids is less than some value, i drop that region, i though this would solve the problem of several regions one inside the others but i missed that if a region with less area is detected first(like just a part of a letter) all the bigger regions (that may contain the whole letter) are ignored, here is my code
centroids = []
for region in regionprops(label_image):
if len(centroids) == 0:
centroids.append(region.centroid[1])
do some stuff...
if len(centroids) != 0:
distances = []
for centroid in centroids:
distance = abs(centroid - region.centroid[1])
distances.append(distance)
if all(i >= 0.5 * region_width for i in distances):
do some stuff...
centroids.append(region.centroid[1])
now the questions here is if there is a way to order the list returned by regionprops by area? and how to do it?, or if you can give a better way to avoid the problem of a region inside another regions, thanks in advance
The Python built-in sorted()
takes a key=
argument, a function by which to sort, and a reversed=
argument to sort in decreasing order. So you can change your loop to:
for region in sorted(
regionprops(label_image),
key=lambda r: r.area,
reverse=True,
):
To check whether one region is completely contained in another, you can use r.bbox
, and check whether one box is inside another, or overlaps it.
Finally, if you have a lot of regions, I recommend you build a scipy.spatial.cKDTree
with all the centroids before running your loop, as this will make it much faster to check whether a region is close to existing ones.