python opencv image-processing scikit-image watershed

Detection of too many tiny areas when measuring large particles in binary images

I am moving from ImageJ to Python for doing image processing of images showing particles - either on a bench or falling - and trying to write my first code for 1) opening images of particles, 2) applying thresholding and noise reduction to obtain binary images, 3) apply watershed to separate overlapping particles, and 4) measure the detected regions to get information on particle size distributions (e.g., area, perimeter, axes).

I am working on the attached image for now, to keep things simple. It's a bunch of particles from 0.5 to 1 mm in diameter on a bench.

Particles 0.5-1 mm

Now, I wrote this code in Python that uses a mix of opencv and skimage functions to create and polish binary images, apply watershed (Not needed for the example above, but for the future - where I will image falling particles at high-speed - it'll be very useful), and measure the identified cells.

Edit: attached a second image where I have the same issue, but for smaller particles (0.250-0.5 mm, but where watershed is important)'

Smaller particles where I need watershed

EDIT2: Shorter code to show the main processing leading to the issue.

import cv2
import numpy as np
import pandas as pd
import os
from skimage.segmentation import watershed, clear_border
from skimage import measure, color, io, morphology

# Define the folder path containing the images
image = "C:/..."

# Set scale pixels/mm (either based on ImageJ or by reference image, or calculations)
px_per_mm = 222
mm_per_px = 1 / px_per_mm
img = cv2.imread(image)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

#Threshold and remove noise
thresholded_img =  cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

kernel = np.ones((5, 5),np.uint8)
image = cv2.morphologyEx(thresholded_img, cv2.MORPH_OPEN, kernel)
image = clear_border(image)

# Prepare for Watershed
sure_bg = cv2.dilate(image,kernel, iterations=4)
dist_transform = cv2.distanceTransform(image, cv2.DIST_L2, 0)
ret2, sure_fg = cv2.threshold(dist_transform, 0.4 * dist_transform.max(), 255, 0)
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg, sure_fg)
ret3, markers = cv2.connectedComponents(sure_fg, connectivity=8)
markers = markers + 10
markers[unknown == 255] = 0

# Now we are ready for watershed filling.
markers = cv2.watershed(img, markers)

#Measure properties
label_image = measure.label(markers,background=255, connectivity=None)
props = measure.regionprops_table(label_image, image,
                          properties=['label', 'Area'])

# Scale properties from pixels to mm
props['Area'] = props['Area'] * mm_per_px ** 2

# Remove the image frame that is usually identified as a region
max_area_threshold = 700
props_df = pd.DataFrame(props)
filtered_props = props_df[props_df['Area'] <= max_area_threshold]
# Convert the filtered properties to a DataFrame
particle_analysis = pd.DataFrame(filtered_props)

Now, the current code gives me this distribution for the particles area:

Area distribution

It is mostly correct, but showing a lot of areas in the 0.0-0.5 bin, that are not present in the image once the noise and the small particulate is eliminated. With ImageJ, I would get similar results but without the huge initial bin: ImageJ results

If I check the binary image, watershed etc, it all seem fine, so no idea why is detecting such small regions. It happens with all the images I am using for testing the code.

Any help in identifying the issue is most welcome! Many thanks!

I wrote the code based on tutorials on opencv and skimage, and tested several ways of thresholding, reducing noise, morphology operations and changing parameters of the functions for measuring the regions, trying to only detect the real particles in the analysis. However, I always get that initial bin with apparently regions characterized by very small areas

Solution

The line

image = cv2.morphologyEx(thresholded_img, cv2.MORPH_OPEN, kernel)

removes small objects, and smooths out the boundary of larger objects. But the objects being removed are smaller than a 5x5 pixel square, which is 0.0005 mm². The ImageJ code you're replicating somehow filters out much larger particles. When I look at the size distribution for the particles smaller than 0.1 mm², I see something that likely is drawn from a log-normal distribution, with particles in the range 0.006-0.015. So you could remove all particles smaller than, say 0.02 mm² (~31x31 pixels).

You can accomplish that in two ways:

Use a much larger structuring element in the opening. This is problematic because OpenCV is very slow with larger structuring elements, and because a larger opening will distort the area of larger objects. If you go this route, you should use a circular structuring element, not a square one.*
Filter out smaller particles afterwards, like you do with larger objects. Just add a minimum area threshold to this line:

filtered_props = props_df[props_df['Area'] <= max_area_threshold]

There's a third option, which is that these smaller particles are actually part of your sample and you should be measuring them...

^{* I would also suggest you use DIPlib instead of OpenCV, it's opening is much faster with larger structuring elements. But I'm biased because I'm quite deeply involved in that project.}