python opencv image-processing image-thresholding

denoising binary image in python

For my project i'm trying to binarize an image with openCV in python. I used the adaptive gaussian thresholding from openCV to convert the image with the following result:

I want to use the binary image for OCR but it's too noisy. Is there any way to remove the noise from the binary image in python? I already tried fastNlMeansDenoising from openCV but it doesn't make a difference.

P.S better options for binarization are welcome as well

Solution

You should start by adjusting the parameters to the adaptive threshold so it uses a larger area. That way it won't be segmenting out noise. Whenever your output image has more noise than the input image, you know you're doing something wrong.

I suggest as an adaptive threshold to use a closing (on the input grey-value image) with a structuring element just large enough to remove all the text. The difference between this result and the input image is exactly all the text. You can then apply a regular threshold to this difference.